d2l-ai / d2l-en

Interactive deep learning book with multi-framework code, math, and discussions. Adopted at 500 universities from 70 countries including Stanford, MIT, Harvard, and Cambridge.
https://D2L.ai
Other
23.24k stars 4.27k forks source link

Inconsistent learning curves of two NiN trainings #905

Open cuongvng opened 4 years ago

cuongvng commented 4 years ago

When our team trained the NiN network in chapter 7 more than one time, we got different learning curves: First time:

image

Second time:

image

Maybe there is a problem with the initialization? Should the random_seed be set or what can we do to solve the problem?

mli commented 4 years ago

it looks like the training is not very stable, maybe learning rate is too large?

hnguyentt commented 4 years ago

I think that @cuongvng used the same parameters from the book for two pieces of training but got two different results (one good and one bad). Perhaps, he is mentioning about the reproducibility of the code.

I saw in this discussion that @mli also found some ways to make the result reproducible but the random seed does not work for dropouts and may not have effect on GPU. Up to now, is this still correct? Have you found out any methods to make the result reproducible?

cuongvng commented 4 years ago

@mli thank you for your advice. I tried reducing lr from 0.1 to 0.01, and the learning curves after running 5 times followed the same pattern, though they were not identical. The mxnet code still cannot reproduce the same result.

goldmermaid commented 4 years ago

Hi @cuongvng ,can you try to the change the force_reinit = True to False at train_ch6 https://github.com/d2l-ai/d2l-en/blob/4b0ea4bf1821049fa4a044c88fa6e0ec52a0630d/d2l/mxnet.py#L417? I tried with running 7 times and they all work well.

cuongvng commented 4 years ago

@goldmermaid thank you. Setting force_reinit=False is better than True. The learning curves had the same pattern, though they did not have exactly the same loss and acc.

astonzhang commented 4 years ago

We force_reinit=True since it may be invoked multiple times in a notebook and each invocation should train the model from scratch (re-initialize params). The re-initialization takes place before the training loops.

Note that train_ch6 uses Xavier init. If you set it to False, in the nin section it will use the default init by net.initialize() before invoking train_ch6. They are different.