facebookresearch / MUSE

A library for Multilingual Unsupervised or Supervised word Embeddings
Other
3.18k stars 552 forks source link

RuntimeError with any text based embedding #56

Closed crypotex closed 6 years ago

crypotex commented 6 years ago

On first epoch I get this error with text based embeddings. I tried the given commands in the end with wikipedia pretrained fasttext embeddings for english and spanish and I still get this same error. Stactrace (Same example as in the readme - EN - ES):

INFO - 06/11/18 13:26:02 - 0:01:32 - Starting iteration 1... INFO - 06/11/18 13:26:02 - 0:01:32 - Building the train dictionary ... Traceback (most recent call last): File "supervised.py", line 91, in trainer.build_dictionary() File "/gpfs/hpchome/b02166/thesis/upd_muse/MUSE/src/trainer.py", line 167, in build_dictionary self.dico = build_dictionary(src_emb, tgt_emb, self.params) File "/gpfs/hpchome/b02166/thesis/upd_muse/MUSE/src/dico_builder.py", line 175, in build_dictionary dico = torch.LongTensor(list([[a, b] for (a, b) in final_pairs])) RuntimeError: tried to construct a tensor from a nested int sequence, but found an item of type numpy.int64 at index (0, 0)

glample commented 6 years ago

Hi,

What version of PyTorch / Python are you using? Also, what is the value of params.dico_build. Are you using the last version of the code on the repo? What happens if you replace line 175 in dico_builder with: dico = torch.LongTensor(list([[int(a), int(b)] for (a, b) in final_pairs])) ?

glample commented 6 years ago

Okay it looks like the issue was because of this: https://github.com/pytorch/pytorch/issues/8365 PyTorch used to be more flexible and this was working before, anyway fixed in https://github.com/facebookresearch/MUSE/commit/f516a9bcd1d70b980dd391d0b1f107329df569c4 Thanks for noticing this issue.