BehroozMansouri / TangentCFT

48 stars 12 forks source link

ERROR:root:'all ngrams for word \ueae8\uea8c𑇗Ǵ absent from model' when doing retrieval. #17

Closed WangPeiSyuan closed 1 year ago

WangPeiSyuan commented 2 years ago

I searched for this error, and it was caused by the word is not present in the training vocabulary. Hence, the FastText model cannot return a meaningful word vector for the input word. But I already used all the training data in "MathtagArticles" directory. Is there anything I miss?

ERROR:root:'all ngrams for word \ueae8\uea8c𑇗Ǵ absent from model' Traceback (most recent call last): File "/app/TangentCFT/tangent_cft_module.py", line 115, in __get_vector_representation temp_vector = temp_vector + self.model.get_vector_representation(encoded_tuple) File "/app/TangentCFT/tangent_cft_model.py", line 45, in get_vector_representation return self.model.wv[encoded_math_tuple] File "/home/user/miniconda/lib/python3.9/site-packages/gensim/models/keyedvectors.py", line 169, in getitem return self.get_vector(entities) File "/home/user/miniconda/lib/python3.9/site-packages/gensim/models/keyedvectors.py", line 277, in get_vector return self.word_vec(word) File "/home/user/miniconda/lib/python3.9/site-packages/gensim/models/keyedvectors.py", line 1622, in word_vec raise KeyError('all ngrams for word %s absent from model' % word) KeyError: 'all ngrams for word \ueae8\uea8c𑇗Ǵ absent from model'

The error appear many times with different word. But after it, it still produced "slt_ret.tsv" file, will it cause any problem for the retrieval result?

jiaqizhao122 commented 11 months ago

hello, I'm also have this problem, will the presence of this prompt cause problems for the finally results?How did you solve this problem finally?