I am running two versions of faiseq for neural machine translation, one is 0.6 and the other is 0.9, and find the data preprocessing results of IWSLT (for NMT) are inconsistent. I prepare the dataset following the guide in ./examples/translation/README.md, which includes:
1) ./examples/translation/prepare-iwslt14.sh (for IWSLT-14)
2) data binarization.
But I find the results of two fairseq versions are different. For example:
1) IWSLT fairseq-0.6 produces 42MB binarized data, but 21MB for fairseq 0.9.
IWSLT processed by fairseq 0.6: 42MB
IWSLT processed by fairseq 0.9: 21MB
fairseq Version: fairseq 0.6 & fairseq 0.9
PyTorch Version 1.5.1
How you installed fairseq (pip, source): Yes. pip install . --editable
❓ Questions and Help
What is your question?
I am running two versions of faiseq for neural machine translation, one is 0.6 and the other is 0.9, and find the data preprocessing results of IWSLT (for NMT) are inconsistent. I prepare the dataset following the guide in ./examples/translation/README.md, which includes:
1) ./examples/translation/prepare-iwslt14.sh (for IWSLT-14) 2) data binarization.
But I find the results of two fairseq versions are different. For example: 1) IWSLT fairseq-0.6 produces 42MB binarized data, but 21MB for fairseq 0.9.
IWSLT processed by fairseq 0.6: 42MB
IWSLT processed by fairseq 0.9: 21MB
pip
, source): Yes. pip install . --editable