Open Ir1d opened 4 years ago
@MultiPath
Hi thanks. Gao Peng emailed me yesterday and shared with me his commands.
srun --gres gpu:1 fairseq-generate data-bin/wmt16_en_de_bpe32k --path checkpoint.avg20.pt --beam 4 --lenpen 0.6 --gen-subset train > distill_txt/distill_full_0.txt
python examples/backtranslation/extract_bt_data.py --minlen 1 --maxlen 250 --ratio 3 --output extract_txt/distill_full_0 --srclang en --tgtlang de distill_txt/distill_full_0.txt
After doing that, I ran the preprocess again to generate binarized dataset. Then the distillation dataset is good to go.
I'm reopening this issue because I couldn't achieve ideal result (BLEU > 10) when using my generated distillation dataset. And I'm afraid there's something wrong in the generation process.
@MultiPath Could you guide me a little bit on this?
After running the above mentioned commands, I get a lot of meaningless words in my results.
The teacher model could achieve BLEU > 27, and the student model couldn't even reach BLEU of 10. I tried to run the commands from https://github.com/pytorch/fairseq/tree/master/examples/nonautoregressive_translation#train-a-model .
Hi, have you solved that? I am facing the same problem.
Is the problem already solved?
Hi thanks. Gao Peng emailed me yesterday and shared with me his commands.
srun --gres gpu:1 fairseq-generate data-bin/wmt16_en_de_bpe32k --path checkpoint.avg20.pt --beam 4 --lenpen 0.6 --gen-subset train > distill_txt/distill_full_0.txt python examples/backtranslation/extract_bt_data.py --minlen 1 --maxlen 250 --ratio 3 --output extract_txt/distill_full_0 --srclang en --tgtlang de distill_txt/distill_full_0.txt
After doing that, I ran the preprocess again to generate binarized dataset. Then the distillation dataset is good to go.
Hi , how can I use the srun command, I have tried many methods to no avail
🐛 Bug
According to the doc, it said "The easiest way of performing distillation is to follow the instructions of training a standard transformer model on the same data, and then decode the training set to produce a distillation dataset for NAT".
I want to know what exactly is the process of decoding the training set.
I tried running
However, no result file was generated anywhere.
I searched the existing issues and it seems no one has solved this yet.