facebookresearch / MUSE

A library for Multilingual Unsupervised or Supervised word Embeddings
Other
3.17k stars 544 forks source link

coding issue using demo.ipynb #152

Closed kahjohansson closed 4 years ago

kahjohansson commented 4 years ago

I am trying to generate the nearest neighbors from a list of source words in english, but certain words throws some errors. For example, for the word 'são', throws the following errors:

Traceback (most recent call last):
  File "get_nearest_neighbors.py", line 60, in <module>
    get_nn_from_file(source_path, target_path)
  File "get_nearest_neighbors.py", line 51, in get_nn_from_file
    nearest_words = get_nn(src_word, src_embeddings, src_id2word, src_embeddings, src_id2word, K=5)
  File "get_nearest_neighbors.py", line 35, in get_nn
    word_emb = src_emb[word2id[word]]
KeyError: 's\xc3\xa3o'

How can I solve that problem?

kahjohansson commented 4 years ago

I done some modifications in the code to get the list of words from file and to save the result into a file. The errors using the word 'são' as input with the original demo.ipynb are:

 File "get_nearest_neighbors_original.py", line 38
SyntaxError: Non-ASCII character '\xc3' in file get_nearest_neighbors_original.py on line 38, but no encoding declared; see http://python.org/dev/peps/pep-0263/ for details

When I wrote in the first line of the script "coding: utf-8", I got the following errors:

Nearest neighbors of "são":
Traceback (most recent call last):
  File "get_nearest_neighbors_original.py", line 40, in <module>
    get_nn(src_word, src_embeddings, src_id2word, tgt_embeddings, tgt_id2word, K=5)
  File "get_nearest_neighbors_original.py", line 32, in get_nn
    word_emb = src_emb[word2id[word]]
KeyError: 's\xc3\xa3o'
kahjohansson commented 4 years ago

The problem happened because of the python version that I was using: 2.7. In version 3, I didn't have that problem.