explosion / sense2vec

🦆 Contextually-keyed word vectors
MIT License
1.62k stars 240 forks source link

Can't save the s2v model to disk #125

Closed ditordccaa closed 3 years ago

ditordccaa commented 3 years ago

Hi there, I'm trying to save a sense2vec model to disk using the to_disk method but it fails with an error.


I tried to track down the issue by looking to the ujson folder of the srsly package but couldn't understand since I believe it's wirtten in C and I don't have any knowledge for that :)

TypeError                                 Traceback (most recent call last)
<ipython-input-59-ec8fe7745533> in <module>()
----> 1 srsly.write_json(output_path / "cfg", s2v.cfg)

1 frames
/usr/local/lib/python3.6/dist-packages/srsly/_json_api.py in json_dumps(data, indent, sort_keys)
     24         )
     25     else:
---> 26         result = ujson.dumps(data, indent=indent, escape_forward_slashes=False)
     27     if sys.version_info[0] == 2:  # Python 2
     28         return result.decode("utf8")

TypeError: {'PART', 'QUANTITY', 'DATE', 'PROPN', 'ORDINAL', 'PERCENT', 'CARDINAL', 'LOC', 'ADJ', 'VERB', 'ADV', 'NORP', 'DET', 'FAC', 'PRON', 'ORG', 'MONEY', 'NUM', 'LANGUAGE', 'X', 'PRODUCT', 'INTJ', 'ADP', 'SCONJ', 'TIME', 'EVENT', 'AUX', 'GPE', 'SYM', 'NOUN', 'PERSON', 'CCONJ', 'WORK OF ART', 'PUNCT'} is not JSON serializable

While I was experimenting, I've noticed when I print s2v.cfg, it's a dictionary in the form below and I think that the value for the keyword 'senses' should be a list and not a dictionary.

{'make_key': 'default',
 'senses': {'ADJ',
 'split_key': 'default'}

Python version

Python 3.6.9

Packages versions

srsly 1.0.4
spacy 2.2.4
catalogue 1.0.0
numpy 1.18.5

ditordccaa commented 3 years ago

After some verification, there is no problem with the package it's more of the scripts provided for training a custom model. Specifically the script 05_export.py.

I made a change in line 146 by wrapping all_senese with list like the following:

s2v = Sense2Vec(shape=(n_vectors, vector_size), senses=list(all_senses))

I hope it helps.