Open JxuHenry opened 5 years ago
Hi JxuHenry, I'm curious about where to obtain the monolingual corpus for Chinese? Could you share your experience? Thx in advance.
Hi JxuHenry, I'm curious about where to obtain the monolingual corpus for Chinese? Could you share your experience? Thx in advance.
Hi!The UN officially provides parallel corpus for conferences, but I only used Chinese corpus for training.
Hi!The UN officially provides parallel corpus for conferences, but I only used Chinese corpus for training. Oh, thanks for your reply. I found yesterday a nice Chinese corpus for multiply tasks, which also contains monoligual corpus. If you have interest, you can find it here
Hi!The UN officially provides parallel corpus for conferences, but I only used Chinese corpus for training. Oh, thanks for your reply. I found yesterday a nice Chinese corpus for multiply tasks, which also contains monoligual corpus. If you have interest, you can find it here
OK, thank you very much
Hi JxuHenry, I also had the same problem. Have you solved it ?
Hi JxuHenry, I also had the same problem. Have you solved it ?
No I haven't,sorry
Hi, how do you obtain the shared embeddings ./data/mono/all.zh-en.60000.vec
?
Trained on the concatenate data using fastest?
Have you tried on using MUSE to get the aligned embeddings? I think it might help.
I only modified the corpus and trained it. Corpus preprocessing is the same as "get_data_enfr.sh" file wrote. Operating parameters are as follows: python main.py --exp_name zhTest --transformer True --n_enc_layers 4 --n_dec_layers 4 --share_enc 3 --share_dec 3 --share_lang_emb True --share_output_emb True --langs 'en,zh' --n_mono -1 --mono_dataset 'zh:./data/mono/all.zh.tok.60000.pth,,;en:./data/mono/all.en.tok.60000.pth,,' --para_dataset 'en-zh:,./data/para/newdev/newsdev2017-enzh-src.XX.60000.pth,./data/para/newdev/newsdev2017-zhen-ref.XX.60000.pth' --mono_directions 'zh,en' --word_shuffle 3 --word_dropout 0.1 --word_blank 0.2 --pivo_directions 'en-zh-en,zh-en-zh' --pretrained_emb './data/mono/all.zh-en.60000.vec' --pretrained_out True --lambda_xe_mono '0:1,100000:0.1,300000:0' --lambda_xe_otfd 1 --otf_num_processes 30 --otf_sync_params_every 1000 --enc_optimizer adam,lr=0.0001 --epoch_size 500000 --stopping_criterion bleu_zh_en_valid,10 Do I need to modify other things?