facebookresearch / MUSE

A library for Multilingual Unsupervised or Supervised word Embeddings
Other
3.18k stars 552 forks source link

Segmentation fault (core dumped) #110

Open Saumajit opened 5 years ago

Saumajit commented 5 years ago

I am using the unsupervised version of training. I need to have a crosslingual embedding where Hindi is my source language and Bengali is my target language. As suggested by @glample , I had commented out self.word_translation(to_log). However, when I am running the code, I am getting the error 'Segmentation Fault (core dumped)' . Please let me know if any other changes are to be made. There is no dictionary provided for this language pair. I am using Fasttext's wiki monolingual embeddings for both the source and target language.

vinhsuhi commented 5 years ago

you can try to uninstall faiss. That worked for me.

aconneau commented 5 years ago

@Saumajit : Where in the code do you get the segmentation fault? Do you have the .log?

@vinhsuhi : Did faiss lead to a segmentation fault in your case? If so, where exactly in the code?

Thanks a lot! Alexis

Saumajit commented 5 years ago

@aconneau After the code runs for around 10 minutes. Building dictionary gets printed Then suddenly it shows Segmentation Fault(Core dumped)

arijitx commented 5 years ago

I'm training English - Chinese Cross lingual embedding

  python supervised.py --src_lang en --tgt_lang cc --src_emb ../UnsupervisedMT/NMT/data/embeddings/english.txt  --tgt_emb data/cc.zh.300.vec --dico_train data/en-zh_train.txt --dico_eval data/en-zh_valid.txt --normalize_embeddings center
  INFO - 02/26/19 15:52:21 - 0:00:00 - ============ Initialized logger ============
  INFO - 02/26/19 15:52:21 - 0:00:00 - cuda: True
                                       dico_build: S2T&T2S
                                       dico_eval: data/en-zh_valid.txt
                                       dico_max_rank: 10000
                                       dico_max_size: 0
                                       dico_method: csls_knn_10
                                       dico_min_size: 0
                                       dico_threshold: 0
                                       dico_train: data/en-zh_train.txt
                                       emb_dim: 300
                                       exp_id: 
                                       exp_name: debug
                                       exp_path: /exp/arijit/MUSE/dumped/debug/col1imqfus
                                       export: txt
                                       max_vocab: 200000
                                       n_refinement: 5
                                       normalize_embeddings: center
                                       seed: -1
                                       src_emb: ../UnsupervisedMT/NMT/data/embeddings/english.txt
                                       src_lang: en
                                       tgt_emb: data/cc.zh.300.vec
                                       tgt_lang: cc
                                       verbose: 2
  INFO - 02/26/19 15:52:21 - 0:00:00 - The experiment will be stored in /exp/arijit/MUSE/dumped/debug/col1imqfus
  INFO - 02/26/19 15:52:34 - 0:00:14 - Loaded 200000 pre-trained word embeddings.
  INFO - 02/26/19 15:53:00 - 0:00:40 - Loaded 200000 pre-trained word embeddings.
  INFO - 02/26/19 15:53:02 - 0:00:42 - Found 8457 pairs of words in the dictionary (4901 unique). 271 other pairs contained at least one unknown word (0 in lang1, 271 in lang2)
  INFO - 02/26/19 15:53:02 - 0:00:42 - Validation metric: precision_at_1-csls_knn_10
  INFO - 02/26/19 15:53:02 - 0:00:42 - Starting iteration 0...
  INFO - 02/26/19 15:53:03 - 0:00:42 - Found 2122 pairs of words in the dictionary (1440 unique). 108 other pairs contained at least one unknown word (0 in lang1, 108 in lang2)
  INFO - 02/26/19 15:53:03 - 0:00:43 - 1440 source words - nn - Precision at k = 1: 43.263889
  INFO - 02/26/19 15:53:03 - 0:00:43 - 1440 source words - nn - Precision at k = 5: 65.833333
  INFO - 02/26/19 15:53:03 - 0:00:43 - 1440 source words - nn - Precision at k = 10: 72.291667
  INFO - 02/26/19 15:53:03 - 0:00:43 - Found 2122 pairs of words in the dictionary (1440 unique). 108 other pairs contained at least one unknown word (0 in lang1, 108 in lang2)
  Segmentation fault (core dumped)
Saumajit commented 5 years ago

Thanks @arijitx . I also got stuck here only. Mine was hindi-bengali cross lingual dictionary. @aconneau

arijitx commented 5 years ago

Uninstalling Faiss solved my issue.

Saumajit commented 5 years ago

Okay.. Thanks for the information @arijitx and @vinhsuhi

lixin4ever commented 5 years ago

Uninstalling faiss-gpu also works for me.

yclzju commented 5 years ago

I am using the unsupervised version of training. I find that I get Segmentation fault in refinement. With debug, the problem exists in trainer.procrustes(), U, S, V_t = scipy.linalg.svd(M, full_matrices=True); I don't know why and how to solve it. I don't install faiss, and I can run it and get the result in the python console.