Closed genbei closed 4 years ago
Based on the error message, it's related to the syntax error of calling fairseq's pre-built function. One quick fix is to call the python script directly rather than using fairseq's pre-built command line function. More specifically, replace the fairseq-preprocess
in the scripts/preprocess.sh
to python $repo/fairseq/preprocess.py
as follows.
python $repo/fairseq/preprocess.py --source-lang ${sl} --target-lang $tl \
--trainpref $data_dir/${d}-train.bpe.clean \
--validpref $data_dir/${d}-dev.bpe \
--testpref $data_dir/${d}-test.bpe,$data_dir/emea-test.bpe,$data_dir/koran-test.bpe,$data_dir/subtitles-test.bpe,$data_dir/acquis-test.bpe \
--destdir $out_dir/data-bin-join/${d}/ \
--srcdict $out_dir/data-bin-join/${d}/dict.${sl}.txt \
--tgtdict $out_dir/data-bin-join/${d}/dict.${tl}.txt
Let me know if that works for you!
Preprocess the data in the source (it) domain bash scripts/preprocess.sh
../preprocess.py