Closed socaty closed 5 years ago
I have the same problem, @glample . Can u do me a favor?
Hi,
So I'm not sure about your issue, I have not tried en-zh. However, the approach should work for en-zh, I know that this paper did it: https://arxiv.org/pdf/1804.09057.pdf Maybe could you try to use their same setup / datasets / preprocessing etc?
How big are your monolingual corpora?
Also what is ch
in wiki.ch.300.vec.20w
? Isn't this code for Chamorro and not Chinese which is zh
? You can use your own corpus, it's probably better, because this way you have embeddings associated with your tokenization / text pre-processing, but if your corpora are small or if you don't have good P@1 accuracy then the fastText ones are probably better.
Also can you try MUSE with the script supervised.py --dico_train identical_char
instead of unsupervised.py
? It will align words by taking as anchor points words that are identical in both languages. It sometimes works better than adversarial, even for distant languages.
ok , thanks
Hi,
I did the same task on zh-en, and the trained models cannot be used to translate test sets, why? Like this:
The train.log of MUSE as follow: train.log
This problem has been bothering me for a long time, can you give me some guidance? @glample
The train.log of MUSE seems reasonable. Not sure about the message returned by Moses, it's a Moses specific issue. Is it just a warning? What is at the end of the log? Did you have a look at the phrase table you generated? Does it look good?
Thank you for your reply.
The end of the log is like this: And I think the phrase table looks good.
@glample
Hi, @socaty @cocaer Thanks very much!!! when I generated the phrase-table and tried translating test sentences, error occur as follow: can you give me some advise?
Linking phrase-table path... Translating test sentences... Defined parameters (per moses.ini or switch): config: /data/home/super/mt/dataset/muti-domain/unmt/moses_train_en-zh/model/moses.ini distortion-limit: 6 feature: UnknownWordPenalty WordPenalty PhrasePenalty PhraseDictionaryMemory name=TranslationModel0 num-features=2 path=/data/home/super/mt/dataset/muti-domain/unmt/moses_train_en-zh/model/phrase-table.gz input-factor=0 output-factor=0 Distortion KENLM name=LM0 factor=0 path=/data/home/super/mt/dataset/muti-domain/unmt/data/zh.lm.blm order=5 input-factors: 0 mapping: 0 T 0 threads: 48 weight: UnknownWordPenalty0= 1 WordPenalty0= -1 PhrasePenalty0= 0.2 TranslationModel0= 0.2 0.2 Distortion0= 0.3 LM0= 0.5 line=UnknownWordPenalty FeatureFunction: UnknownWordPenalty0 start: 0 end: 0 line=WordPenalty FeatureFunction: WordPenalty0 start: 1 end: 1 line=PhrasePenalty FeatureFunction: PhrasePenalty0 start: 2 end: 2 line=PhraseDictionaryMemory name=TranslationModel0 num-features=2 path=/data/home/super/mt/dataset/muti-domain/unmt/moses_train_en-zh/model/phrase-table.gz input-factor=0 output-factor=0 FeatureFunction: TranslationModel0 start: 3 end: 4 line=Distortion FeatureFunction: Distortion0 start: 5 end: 5 line=KENLM name=LM0 factor=0 path=/data/home/super/mt/dataset/muti-domain/unmt/data/zh.lm.blm order=5 FeatureFunction: LM0 start: 6 end: 6 Loading UnknownWordPenalty0 Loading WordPenalty0 Loading PhrasePenalty0 Loading Distortion0 Loading LM0 Loading TranslationModel0 Start loading text phrase table. Moses format : [0.502] seconds Reading /data/home/super/mt/dataset/muti-domain/unmt/moses_train_en-zh/model/phrase-table.gz ----5---10---15---20---25---30---35---40---45---50---55---60---65---70---75---80---85---90---95--100
Exception: moses/TranslationModel/RuleTable/LoaderStandard.cpp:202 in bool Moses::RuleTableLoaderStandard::Load(const Moses::AllOptions&, Moses::FormatType, const std::vector
@wingsyuan Sorry, I am not sure about the reason of your problem. In my opinion, is it wrong when training phrase-table? Please check your training process,and I will send my traing script to your email soon.
@wingsyuan I have the same problem here. Have you solved this?
Hi,
I was confused for several days.
I followed the steps of
PBSMT/run.sh
to do my work, and I think the most important step is "Running MUSE to generate cross-lingual embeddings". I aligned the 'zh' and 'en' pre-trained word vectors you provided on [https://fasttext.cc/docs/en/crawl-vectors.html] with MUSE, and got "Adv-NN P@1=21.3、Adv-CSLS P@1=26.9、Adv-Refine-NN P@1=18.5、Adv-Refine-CSLS P@1=24.0".Then, I used the aligned embeddings to generate the phrase-table, but finally I got BLEU of 1.01. I don't think the result is right. Something must have gone wrong.
My command of MUSE is:
python unsupervised.py --src_lang ch \ --tgt_lang en \ --src_emb /data/experiment/embeddings/wiki.ch.300.vec.20w \ --tgt_emb /data/experiment/embeddings/wiki.en.300.vec.20w \ --exp_name test \ --exp_id 0 \ --normalize_embeddings center \ --emb_dim 300 \ --dis_most_frequent 50000 \ --epoch_size 500000 \ --dico_eval /data/experiment/unsupervisedMT/fordict/zh-en.5000-6500.sim.txt \ --n_refinement 5 \ --export "pth"
My command for generate phrase table is:
python create-phrase-table.py \ --src_lang $SRC \ --tgt_lang $TGT \ --src_emb $ALIGNED_EMBEDDINGS_SRC \ --tgt_emb $ALIGNED_EMBEDDINGS_TGT \ --csls 1 \ --max_rank 200 \ --max_vocab 300000 \ --inverse_score 1 \ --temperature 45 \ --phrase_table_path ${PHRASE_TABLE_PATH::-3}
Does the problem lay in the word embeddings, shoud I use the word embeddings trained on my training data with fastText for MUSE? I have tried it (use the word embedding trained on my training data), but got "Adv-NN P@1=0.07、Adv-CSLS P@1=0.07、Adv-Refine-NN P@1=0.00、Adv-Refine-CSLS P@1=0.00". My command is :
./fasttext skipgram -epoch 10 -minCount 0 -dim 300 -thread 48 -ws 5 -neg 10 -input $SRC_TOK -output $EMB_SRC
. So I did't use the word embedding generated on training data , because I think I didn't align them well.So, where is the fault?