explosion / sense2vec

šŸ¦† Contextually-keyed word vectors
https://explosion.ai/blog/sense2vec-reloaded
MIT License
1.62k stars 240 forks source link

sense2vec own model issue #133

Closed starz10de closed 3 years ago

starz10de commented 3 years ago

CentOS Linux release 7.6.1810 Python 3.7 sense2vec 1.03

I want to create my own sense2vec model and I followed the instruction in: https://github.com/explosion/sense2vec#-training-your-own-sense2vec-vectors

All went fine except I got an error in the final step: "exports.py Load the vectors and frequencies and output a sense2vec component that can be loaded via Sense2Vec.from_disk"

The error: TypeError: {'PRON', 'ADP', 'ADJ', 'AUX', 'ADV', 'NOUN', 'PART', 'ORG', 'CARDINAL', 'PERSON', 'NUM', 'SCONJ', 'SYM', 'DATE', 'DET', 'CCONJ', 'ORDINAL', 'VERB', 'X', 'PROPN', 'PUNCT'} is not JSON serializable here the stack trace:

āœ” Created the sense2vec model ā„¹ 365 vectors, 21 total senses Traceback (most recent call last): File "export.py", line 148, in <module> typer.run(main) File "/Prodigy/prodigy-env/lib/python3.7/site-packages/typer/main.py", line 859, in run app() File "/Prodigy/prodigy-env/lib/python3.7/site-packages/typer/main.py", line 214, in __call__ return get_command(self)(*args, **kwargs) File "/Prodigy/prodigy-env/lib/python3.7/site-packages/click/core.py", line 829, in __call__ return self.main(*args, **kwargs) File "/Prodigy/prodigy-env/lib/python3.7/site-packages/click/core.py", line 782, in main rv = self.invoke(ctx) File "/Prodigy/prodigy-env/lib/python3.7/site-packages/click/core.py", line 1066, in invoke return ctx.invoke(self.callback, **ctx.params) File "/Prodigy/prodigy-env/lib/python3.7/site-packages/click/core.py", line 610, in invoke return callback(*args, **kwargs) File "/Prodigy/prodigy-env/lib/python3.7/site-packages/typer/main.py", line 497, in wrapper return callback(**use_params) # type: ignore File "export.py", line 67, in main s2v.to_disk(output_path) File "/Prodigy/prodigy-env/lib/python3.7/site-packages/sense2vec/sense2vec.py", line 323, in to_disk srsly.write_json(path / "cfg", self.cfg) File "/Prodigy/prodigy-env/lib/python3.7/site-packages/srsly/_json_api.py", line 74, in write_json json_data = json_dumps(data, indent=indent) File "/Prodigy/prodigy-env/lib/python3.7/site-packages/srsly/_json_api.py", line 26, in json_dumps result = ujson.dumps(data, indent=indent, escape_forward_slashes=False) TypeError: {'PRON', 'ADP', 'ADJ', 'AUX', 'ADV', 'NOUN', 'PART', 'ORG', 'CARDINAL', 'PERSON', 'NUM', 'SCONJ', 'SYM', 'DATE', 'DET', 'CCONJ', 'ORDINAL', 'VERB', 'X', 'PROPN', 'PUNCT'} is not JSON serializable

danielmoore19 commented 3 years ago

yes, similar. the issue is line 61:

s2v = Sense2Vec(shape=(n_vectors, vector_size), senses=all_senses)

you need to wrap all_senses in a list:

s2v = Sense2Vec(shape=(n_vectors, vector_size), senses=list(all_senses))

i forget where i learned this, but a few months ago i had this same issue and this solves it. happy training!

starz10de commented 3 years ago

Dear danielmoore19 , Thanks a lot !

svlandeg commented 3 years ago

Thanks @danielmoore19, I've created the corresponding PR to fix this! https://github.com/explosion/sense2vec/pull/135