em_dict, et_dict, mention_count = ht.entity_discover(text="\n".join(comments), return_count=True, min_count=3,
method="NFL",
threshold=0.97)
all_mentions = set(x for enty, ments in em_dict.items() for x in ments)
print(f"Num entities: {len(em_dict)}, Num mentions: {len(all_mentions)}")
Error:
Doing NER
100%|██████████| 1303/1303 [00:03<00:00, 403.01it/s]
Training fasttext
Traceback (most recent call last):
File "/Users/francisdu/Code/Python/pass_comments/passenger/entity_discover.py", line 18, in <module>
threshold=0.97)
File "/Users/francisdu/.pyenv/versions/3.6.9/lib/python3.6/site-packages/harvesttext/word_discover.py", line 232, in entity_discover
min_count, pinyin_tolerance, self.pinyin_adjlist, **kwargs)
File "/Users/francisdu/.pyenv/versions/3.6.9/lib/python3.6/site-packages/harvesttext/algorithms/entity_discoverer.py", line 134, in __init__
min_n, max_n)
File "/Users/francisdu/.pyenv/versions/3.6.9/lib/python3.6/site-packages/harvesttext/algorithms/entity_discoverer.py", line 150, in train_emb
id2word = [wd for wd in id2word if wd in model.wv.vocab]
File "/Users/francisdu/.pyenv/versions/3.6.9/lib/python3.6/site-packages/harvesttext/algorithms/entity_discoverer.py", line 150, in <listcomp>
id2word = [wd for wd in id2word if wd in model.wv.vocab]
File "/Users/francisdu/.pyenv/versions/3.6.9/lib/python3.6/site-packages/gensim/models/keyedvectors.py", line 646, in vocab
"The vocab attribute was removed from KeyedVector in Gensim 4.0.0.\n"
AttributeError: The vocab attribute was removed from KeyedVector in Gensim 4.0.0.
Use KeyedVector's .key_to_index dict, .index_to_key list, and methods .get_vecattr(key, attr) and .set_vecattr(key, attr, new_val) instead.
See https://github.com/RaRe-Technologies/gensim/wiki/Migrating-from-Gensim-3.x-to-4
Code:
Error:
These changes come from changes since
gensim