alex-berard / seq2seq

Attention-based sequence to sequence learning
Apache License 2.0
388 stars 122 forks source link

can't find the APE17 data file #14

Open snailcoder opened 6 years ago

snailcoder commented 6 years ago

When I run this command in terminal,

./seq2seq.sh config/APE17/chained.yaml --train -v

an error is raised:

Traceback (most recent call last): File "/home/devops/.conda/envs/tensorflow_py35/lib/python3.5/runpy.py", line 193, in _run_module_as_main "main", mod_spec) File "/home/devops/.conda/envs/tensorflow_py35/lib/python3.5/runpy.py", line 85, in _run_code exec(code, run_globals) File "/home/devops/ape/seq2seq/translate/main.py", line 298, in main() File "/home/devops/ape/seq2seq/translate/main.py", line 230, in main model = TranslationModel(config) File "/home/devops/ape/seq2seq/translate/translation_model.py", line 67, in init ref_ext=ref_ext, binary=self.binary, kwargs) File "/home/devops/ape/seq2seq/translate/utils.py", line 217, in get_filenames shutil.copy(src, dest) File "/home/devops/.conda/envs/tensorflow_py35/lib/python3.5/shutil.py", line 241, in copy copyfile(src, dst, follow_symlinks=follow_symlinks) File "/home/devops/.conda/envs/tensorflow_py35/lib/python3.5/shutil.py", line 120, in copyfile with open(src, 'rb') as fsrc: FileNotFoundError: [Errno 2] No such file or directory: 'data/APE17/vocab.mt'

It seems the script can't find the APE17 data files. Where can I get these files?

alex-berard commented 6 years ago

Hello,

You can find the data on the main page of the WMT17 APE task. I don't think that I'm allowed to distribute the files, because of the license agreement. Then, you have to put the extracted files inside seq2seq/raw_data, and run config/APE17/prepare-2017.sh.

Alexandre

snailcoder commented 6 years ago

Thanks!

But maybe there're still some problems in config/APE17/prepare-2017.sh ?

cat ${raw_data}/{train,train.2017}.mt > ${data_dir}/train.mt cat ${raw_data}/{train,train.2017}.pe > ${data_dir}/train.pe cat ${raw_data}/{train,train.2017}.src > ${data_dir}/train.src cp ${raw_data}/{dev,test,500K}.{src,mt,pe} ${data_dir} cp ${raw_data}/test.2017.{src,mt} ${data_dir}

In fact, there are no {train,train.2017}.mt, {train,train.2017}.pe in data files downloaded from WMT17 APE task's main page.