Closed youshimanon closed 6 years ago
This happens if the source and target language can't be inferred automatically. Typically the language direction is inferred based on the directory/naming structure. For example, if your data directory contains files: train.de-en.de.bin
, train.de-en.de.idx
, train.de-en.en.bin
, train.de-en.en.idx
, then we assume that the source language is "de" and the target language is "en".
Maybe you're using a different naming/directory structure than the default? You can specify the languages explicitly with the --source-lang
and --target-lang
options.
I'm using the same naming/directory structure as the default. And I have specified the languages with --source-lang de and --target-lang en. The commands are: cd $PBS_O_WORKDIR mkdir -p checkpoints/fconv CUDA_VISIBLE_DEVICES=0 python train.py data/iwslt14.tokenized.de-en --source-lang de --target-lang en --lr 0.25 --clip-norm 0.1 --dropout 0.2 --max-tokens 4000 --arch fconv_iwslt_de_en --save-dir checkpoints/fconv
Is there anything wrong?
Ah, it seems you may not have preprocessed the dataset. Please run preprocess.py using the instructions in the README, and then rerun train.py with the path to the preprocessed directory (it should contain several .bin and .idx files).
When I was training the model, I got the error:
Traceback (most recent call last): File "train.py", line 269, in
main()
File "train.py", line 51, in main
dataset = data.load_raw_text_dataset(args.data, splits, args.source_lang, args.target_lang)
File "/scratch/jiajie.ding/module/fairseq-py/fairseq/data.py", line 103, in load_raw_text_dataset
assert src is not None and dst is not None, 'Source and target languages should be provided'
AssertionError: Source and target languages should be provided
What does this mean? And how do I correct this? Thanks