SwordYork / DCNMT

Deep Character-Level Neural Machine Translation
GNU General Public License v3.0
72 stars 19 forks source link

Questions about parameters #5

Closed kadir-gunel closed 7 years ago

kadir-gunel commented 7 years ago

Hello @SwordYork ,

I find it useful to create new threads for unrelated questions hence I created a new one.

In the paper, the learning rate has been changed from 1e-3 to 1e-4. But it is not mentioned in which iteration or what should be the criteria in order to change it. Could you share with me the information?

Also, in the paper, it is written that char embeddings are set to 64 and word embeddings are set to 600. But in the configuration file there is entry for only word embeddings and are set to 64. How can I change the char embeddings?

And lastly, in config file, you meant test set as dev set, right?

By the way, forgive me about the question bombardment :smile: Thank you in advance :+1:

Best Regards Kadir

SwordYork commented 7 years ago

Hi,

You could half the learning rate when you find the learning curve stop decreasing. You could use plot_curve.py to plot the curve. For example, after 30000 iterations: screenshot_20170215_210605 You could try AdaDelta. We need not to set the learning rate when using AdaDelta, but AdaDelta is much slower than Adam.

The char embeddings could be set in config['enc_embed'] and config['dec_embed']. And the word embedding is config['src_dgru_nhids'] which is 512 in my definition.

We don't distinguish test set and dev set in the config file for simplicity, you could set config['test_set'] accordingly.

Your questions are really useful. If you encounter other problems when training, please feel free to raise questions.

Thanks

kadir-gunel commented 7 years ago

Thank you.

kadir-gunel commented 7 years ago

I noticed something about the GPU memory usage. In the configuration file, I set batch_size first to 56 then to 100 and GPU memory usage did not even change; and also src|trg_seq_char_len are set to 450. Am I skipping something?

SwordYork commented 7 years ago

After changing batch_size, you should delete dcnmt_*2*/log and dcnmt_*2*/iterations_state.pkl because it will continue training from the checkpoint which uses the previous batch_size (56) .

kadir-gunel commented 7 years ago

Thank you for the fast response. I deleted those files, but still hardly achieving 2.5GB of memory.

SwordYork commented 7 years ago

How large is the params.npz file? Do you use allow_gc = False ? I think 2.5GB is too small, usually it takes more than 8GB memory.

kadir-gunel commented 7 years ago

This is the Theano flags that I am using : THEANO_FLAGS="on_unused_input=ignore, device=gpu, floatX=float32

SwordYork commented 7 years ago

I think it is better to try THEANO_FLAGS="on_unused_input=ignore, device=gpu, floatX=float32, allow_gc = False" It will speedup training, but consume larger memory. Or you could try cnmem.

kadir-gunel commented 7 years ago

Yep! :+1: Now, it seems normal. Nearly 8GB of memory.

Thanks.

SwordYork commented 7 years ago

Never mind. Just be careful that the memory may be overflow.