JunjieHu / dali

Domain Adaptation of Neural Machine Translation by Lexicon Induction
20 stars 5 forks source link

I made an error when I preprocessed according to your steps #3

Closed genbei closed 4 years ago

genbei commented 4 years ago

Preprocess the data in the source (it) domain bash scripts/preprocess.sh

../preprocess.py

image

JunjieHu commented 4 years ago

Based on the error message, it's related to the syntax error of calling fairseq's pre-built function. One quick fix is to call the python script directly rather than using fairseq's pre-built command line function. More specifically, replace the fairseq-preprocess in the scripts/preprocess.sh to python $repo/fairseq/preprocess.py as follows.

python $repo/fairseq/preprocess.py --source-lang ${sl} --target-lang $tl \
    --trainpref $data_dir/${d}-train.bpe.clean \
    --validpref $data_dir/${d}-dev.bpe \
    --testpref $data_dir/${d}-test.bpe,$data_dir/emea-test.bpe,$data_dir/koran-test.bpe,$data_dir/subtitles-test.bpe,$data_dir/acquis-test.bpe \
    --destdir $out_dir/data-bin-join/${d}/ \
    --srcdict $out_dir/data-bin-join/${d}/dict.${sl}.txt \
    --tgtdict $out_dir/data-bin-join/${d}/dict.${tl}.txt

Let me know if that works for you!