Open Saumajit opened 5 years ago
you can try to uninstall faiss. That worked for me.
@Saumajit : Where in the code do you get the segmentation fault? Do you have the .log?
@vinhsuhi : Did faiss lead to a segmentation fault in your case? If so, where exactly in the code?
Thanks a lot! Alexis
@aconneau After the code runs for around 10 minutes. Building dictionary gets printed Then suddenly it shows Segmentation Fault(Core dumped)
I'm training English - Chinese Cross lingual embedding
python supervised.py --src_lang en --tgt_lang cc --src_emb ../UnsupervisedMT/NMT/data/embeddings/english.txt --tgt_emb data/cc.zh.300.vec --dico_train data/en-zh_train.txt --dico_eval data/en-zh_valid.txt --normalize_embeddings center
INFO - 02/26/19 15:52:21 - 0:00:00 - ============ Initialized logger ============
INFO - 02/26/19 15:52:21 - 0:00:00 - cuda: True
dico_build: S2T&T2S
dico_eval: data/en-zh_valid.txt
dico_max_rank: 10000
dico_max_size: 0
dico_method: csls_knn_10
dico_min_size: 0
dico_threshold: 0
dico_train: data/en-zh_train.txt
emb_dim: 300
exp_id:
exp_name: debug
exp_path: /exp/arijit/MUSE/dumped/debug/col1imqfus
export: txt
max_vocab: 200000
n_refinement: 5
normalize_embeddings: center
seed: -1
src_emb: ../UnsupervisedMT/NMT/data/embeddings/english.txt
src_lang: en
tgt_emb: data/cc.zh.300.vec
tgt_lang: cc
verbose: 2
INFO - 02/26/19 15:52:21 - 0:00:00 - The experiment will be stored in /exp/arijit/MUSE/dumped/debug/col1imqfus
INFO - 02/26/19 15:52:34 - 0:00:14 - Loaded 200000 pre-trained word embeddings.
INFO - 02/26/19 15:53:00 - 0:00:40 - Loaded 200000 pre-trained word embeddings.
INFO - 02/26/19 15:53:02 - 0:00:42 - Found 8457 pairs of words in the dictionary (4901 unique). 271 other pairs contained at least one unknown word (0 in lang1, 271 in lang2)
INFO - 02/26/19 15:53:02 - 0:00:42 - Validation metric: precision_at_1-csls_knn_10
INFO - 02/26/19 15:53:02 - 0:00:42 - Starting iteration 0...
INFO - 02/26/19 15:53:03 - 0:00:42 - Found 2122 pairs of words in the dictionary (1440 unique). 108 other pairs contained at least one unknown word (0 in lang1, 108 in lang2)
INFO - 02/26/19 15:53:03 - 0:00:43 - 1440 source words - nn - Precision at k = 1: 43.263889
INFO - 02/26/19 15:53:03 - 0:00:43 - 1440 source words - nn - Precision at k = 5: 65.833333
INFO - 02/26/19 15:53:03 - 0:00:43 - 1440 source words - nn - Precision at k = 10: 72.291667
INFO - 02/26/19 15:53:03 - 0:00:43 - Found 2122 pairs of words in the dictionary (1440 unique). 108 other pairs contained at least one unknown word (0 in lang1, 108 in lang2)
Segmentation fault (core dumped)
Thanks @arijitx . I also got stuck here only. Mine was hindi-bengali cross lingual dictionary. @aconneau
Uninstalling Faiss solved my issue.
Okay.. Thanks for the information @arijitx and @vinhsuhi
Uninstalling faiss-gpu also works for me.
I am using the unsupervised version of training. I find that I get Segmentation fault in refinement. With debug, the problem exists in trainer.procrustes(), U, S, V_t = scipy.linalg.svd(M, full_matrices=True); I don't know why and how to solve it. I don't install faiss, and I can run it and get the result in the python console.
I am using the unsupervised version of training. I need to have a crosslingual embedding where Hindi is my source language and Bengali is my target language. As suggested by @glample , I had commented out self.word_translation(to_log). However, when I am running the code, I am getting the error 'Segmentation Fault (core dumped)' . Please let me know if any other changes are to be made. There is no dictionary provided for this language pair. I am using Fasttext's wiki monolingual embeddings for both the source and target language.