facebookresearch / fairseq

Facebook AI Research Sequence-to-Sequence Toolkit written in Python.
MIT License
30.5k stars 6.41k forks source link

How to generate my own distillation dataset for Levenshtien Transformer #2003

Open Ir1d opened 4 years ago

Ir1d commented 4 years ago

🐛 Bug

According to the doc, it said "The easiest way of performing distillation is to follow the instructions of training a standard transformer model on the same data, and then decode the training set to produce a distillation dataset for NAT".

I want to know what exactly is the process of decoding the training set.

I tried running

fairseq-generate data-bin/wmt17_en_de_joined --path \
checkpoints/transformer_vaswani_wmt_en_de_big/checkpoint_best.pt \
--batch-size 64 --beam 5 --remove-bpe --gen-subset train \
--results-path data-bin/wmt17_en_de_distill

However, no result file was generated anywhere.

I searched the existing issues and it seems no one has solved this yet.

huihuifan commented 4 years ago

@MultiPath

Ir1d commented 4 years ago

Hi thanks. Gao Peng emailed me yesterday and shared with me his commands.

srun --gres gpu:1 fairseq-generate     data-bin/wmt16_en_de_bpe32k     --path checkpoint.avg20.pt  --beam 4 --lenpen 0.6 --gen-subset train  > distill_txt/distill_full_0.txt

python examples/backtranslation/extract_bt_data.py --minlen 1 --maxlen 250 --ratio 3 --output extract_txt/distill_full_0 --srclang en --tgtlang de distill_txt/distill_full_0.txt

After doing that, I ran the preprocess again to generate binarized dataset. Then the distillation dataset is good to go.

Ir1d commented 4 years ago

I'm reopening this issue because I couldn't achieve ideal result (BLEU > 10) when using my generated distillation dataset. And I'm afraid there's something wrong in the generation process.

@MultiPath Could you guide me a little bit on this?

After running the above mentioned commands, I get a lot of meaningless words in my results.

image

The teacher model could achieve BLEU > 27, and the student model couldn't even reach BLEU of 10. I tried to run the commands from https://github.com/pytorch/fairseq/tree/master/examples/nonautoregressive_translation#train-a-model .

speedcell4 commented 3 years ago

Hi, have you solved that? I am facing the same problem.

RamoramaInteractive commented 2 years ago

Is the problem already solved?

kkeleve commented 2 years ago

Hi thanks. Gao Peng emailed me yesterday and shared with me his commands.

srun --gres gpu:1 fairseq-generate     data-bin/wmt16_en_de_bpe32k     --path checkpoint.avg20.pt  --beam 4 --lenpen 0.6 --gen-subset train  > distill_txt/distill_full_0.txt

python examples/backtranslation/extract_bt_data.py --minlen 1 --maxlen 250 --ratio 3 --output extract_txt/distill_full_0 --srclang en --tgtlang de distill_txt/distill_full_0.txt

After doing that, I ran the preprocess again to generate binarized dataset. Then the distillation dataset is good to go.

Hi , how can I use the srun command, I have tried many methods to no avail