antonisa / lang2vec

A simple library for querying the URIEL typological database.
Creative Commons Attribution Share Alike 4.0 International
88 stars 16 forks source link

"seek" error in calculating distances? #8

Open neubig opened 3 years ago

neubig commented 3 years ago

Hi, I'm getting the following error when I try to calculate distances. Not sure if this is a library compatibility problem?

$ python
Python 3.6.10 |Anaconda, Inc.| (default, Jan  7 2020, 15:01:53) 
[GCC 4.2.1 Compatible Clang 4.0.1 (tags/RELEASE_401/final)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import lang2vec.lang2vec as l2v
>>> l2v.distance('syntactic', 'deu', 'eng')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Users/neubig/anaconda3/envs/python3/lib/python3.6/site-packages/lang2vec-1.1.6-py3.6.egg/lang2vec/lang2vec.py", line 401, in distance
  File "/Users/neubig/anaconda3/envs/python3/lib/python3.6/site-packages/scipy/sparse/_matrix_io.py", line 131, in load_npz
    with np.load(file, **PICKLE_KWARGS) as loaded:
  File "/Users/neubig/anaconda3/envs/python3/lib/python3.6/site-packages/numpy/lib/npyio.py", line 439, in load
    fid.seek(-min(N, len(magic)), 1)  # back-up
io.UnsupportedOperation: seek
antonisa commented 3 years ago

If I recall correctly, that's the error of trying to read a sparse matrix from a file that doesn't exist. Have you downloaded the distances file? wget http://www.cs.cmu.edu/~aanastas/files/distances.zip . and move it to lang2vec/data.

(Also, using the distances needs installation from source, rather than from pip)

neubig commented 3 years ago

Ahh, thanks! Seems like it worked. I'll keep the issue open because it might be nice to have a less opaque error message and/or automatic download of the file.

antonisa commented 3 years ago

Sounds good -- thanks!

n8rob commented 2 years ago

Hello, I am encountering this same error, but this fix has not worked for me. I seem to have the required csv files. Has the distances code been updated since 2020? The contents of my lang2vec/data directory are below.

distances2.zip           FEATURAL.csv                      features.npz              GEOGRAPHIC.csv                      learned.npy        phonological_upper_sparse.npz
distances_languages.txt  featural_upper_round1_sparse.npz  GENETIC.csv               geographic_upper_round1_sparse.npz  letter_codes.json  SYNTACTIC.csv
distances.zip            feature_averages.npz              genetic_upper_sparse.npz  INVENTORY.csv                       __MACOSX           syntactic_upper_round2_sparse.npz
family_features.npz      feature_predictions.npz           geocoord_features.npz     inventory_upper_sparse.npz          PHONOLOGICAL.csv
n8rob commented 2 years ago

To resolve this I had to replace

data = sparse.load_npz(zp.open(map_distance_to_filename(dist)))

with

data_dir = '/'.join(DISTANCES_FILE.split('/')[:-1]) + '/'
data = sparse.load_npz(data_dir + map_distance_to_filename(dist))

on line 401 of lang2vec/lang2vec.py after unzipping lang2vec/data/distances2.zip. It seems to be working now.

n8rob commented 2 years ago

It looks like this was a compatibility issue. When I upgraded python to 3.9, neither of these fixes were necessary.