alex-berard / seq2seq

Attention-based sequence to sequence learning
Apache License 2.0
388 stars 122 forks source link

About the Dependencies in the README.md #8

Closed zy158 closed 7 years ago

zy158 commented 7 years ago

Hello! Great work! But when I train a baseline model on the Example model(WMT14), the termination shows "./seq2seq.sh line 4 4353 killed /usr/bin/env python3 -m translate "$@"". When I try again, the termination shows "./seq2seq.sh line 4 5335 killed /usr/bin/env python3 -m translate "$@"". So I want to know whether other dependencies such as memory or CPU, except python3,YAML and Matplotlib.

alex-berard commented 7 years ago

Hello! There is also a dependency on TensorFlow 1.2. Could you run: python3 -m translate experiments/WMT14/baseline.yaml --train -v &> error.log and give me the content of the error.log file?

zy158 commented 7 years ago

Thank you! I will try it immediately!

zy158 commented 7 years ago

Hello! The part content of the error.log file as follow,but it is different from previous two. the first try 08/02 22:45:11 lines read 10500000 08/02 22:45:12 files: experiments/WMT14/experiments/WMT14/data/train.en experiments/WMT14/experiments/WMT14/data/train.fr 08/02 22:45:12 size: 10559682 08/02 22:45:14 reading development data 08/02 22:45:14 files: experiments/WMT14/experiments/WMT14/data/dev.en experiments/WMT14/experiments/WMT14/data/dev.fr 08/02 22:45:14 size: 6003 08/02 22:45:15 starting training ./seq2seq.sh: line 4: 5335 Killed /usr/bin/env python3 -m translate "$@" the second try 08/03 14:32:27 lines read 10500000 08/03 14:32:30 files: experiments/WMT14/experiments/WMT14/data/train.en experiments/WMT14/experiments/WMT14/data/train.fr 08/03 14:32:30 size: 10559682 08/03 14:32:30 reading development data 08/03 14:32:31 files: experiments/WMT14/experiments/WMT14/data/dev.en experiments/WMT14/experiments/WMT14/data/dev.fr 08/03 14:32:31 size: 6003 08/03 14:32:33 starting training terminate called after throwing an instance of 'terminate called recursively std::bad_alloc' ./seq2seq.sh: line 4: 2853 Aborted /usr/bin/env python3 -m translate "$@" the third try 08/03 15:12:51 lines read 10500000 08/03 15:12:55 files: experiments/WMT14/experiments/WMT14/data/train.en experiments/WMT14/experiments/WMT14/data/train.fr 08/03 15:12:55 size: 10559682 08/03 15:12:55 reading development data 08/03 15:12:56 files: experiments/WMT14/experiments/WMT14/data/dev.en experiments/WMT14/experiments/WMT14/data/dev.fr 08/03 15:12:56 size: 6003 08/03 15:12:57 starting training ./seq2seq.sh: line 4: 3492 Killed /usr/bin/env python3 -m translate "$@"

alex-berard commented 7 years ago

Hello, I'm not sure, but this might be a memory problem. The corpus has more than 10M lines, this takes a lot of memory. Did you check how much memory there was left while you were running this? (top or htop)

I suspect that your memory is running low, and because of that your OS had to kill the process. Can you also run the python command without using the seq2seq.sh script?

If it's really a memory problem, you can use the max_train_size parameter in your configuration files.

zy158 commented 7 years ago

Indeed,when I run this, there is almost no memory left. When I use "python3 -m translate experiments/WMT14/baseline.yaml --train -v &> error.log",the process is killed with other information in the termination and the "error.log" just records the information to “starting training”. Could you tell me what is your computer configuration?My computer has 8G memory and 1T hard drive.The Linux system is configured 100G. Whether this configuration is too low? And is there a paper your model based on?

alex-berard commented 7 years ago

The machines that I use have at least 16 GB of memory. As I said, you should use the main_train_size parameter. This allows you to load only a small part of the training set at a time in memory (say 100,000 sentences). This will still go over the entire training set, but it won't load all of it at once.

The baseline.yaml configuration file is not based on any specific paper.

The baseline-legacy.yaml configuration replicates the parameters of Bahdanau et al. 2014. I ran experiments with this configuration recently and it gives the same results as reported in the paper. Maybe you should use the latter configuration if you want to replicate results from the state of the art.

Something else that you should know: if you don't have a big enough GPU (GTX 980 at the very least), this will either run very slow or even not run at all. My baseline-legacy model took an entire week to train on a GTX 1070 (and 4-5GB of GPU memory).

zy158 commented 7 years ago

In terms of this example “WMT4”,which file should I configrue the max_train_size parameter in? main.py? baseline.yaml? baseline-legacy.yaml? I just see the max_train_size:0 in default.yaml. And my tensorflow is CPU version.Should I install the GPU version?

alex-berard commented 7 years ago

You should put this parameter inside baseline.yaml or baseline-legacy.yaml (depending on which one you want to run). Each parameter that you define in your config file overrides the default value in default.yaml.

Does your machine have an Nvidia GPU (Graphics Card)? Which version? In this case you need to install the proper Nvidia driver, CUDA, and the GPU version of TensorFlow. If you don't have an Nvidia Graphics Card (and a big one), there is no way that you will be able to run this experiment in a reasonable amount of time.

Ola131v commented 7 years ago

Hello! I would like to know whether this code will run with python 2.7?

alex-berard commented 7 years ago

Hello, No you need Python 3.4+