Open cuongvng opened 4 years ago
it looks like the training is not very stable, maybe learning rate is too large?
I think that @cuongvng used the same parameters from the book for two pieces of training but got two different results (one good and one bad). Perhaps, he is mentioning about the reproducibility of the code.
I saw in this discussion that @mli also found some ways to make the result reproducible but the random seed does not work for dropouts
and may not have effect on GPU. Up to now, is this still correct? Have you found out any methods to make the result reproducible?
@mli thank you for your advice. I tried reducing lr
from 0.1 to 0.01, and the learning curves after running 5 times followed the same pattern, though they were not identical. The mxnet code still cannot reproduce the same result.
Hi @cuongvng ,can you try to the change the force_reinit = True
to False at train_ch6
https://github.com/d2l-ai/d2l-en/blob/4b0ea4bf1821049fa4a044c88fa6e0ec52a0630d/d2l/mxnet.py#L417? I tried with running 7 times and they all work well.
@goldmermaid thank you. Setting force_reinit=False
is better than True
. The learning curves had the same pattern, though they did not have exactly the same loss and acc.
We force_reinit=True since it may be invoked multiple times in a notebook and each invocation should train the model from scratch (re-initialize params). The re-initialization takes place before the training loops.
Note that train_ch6 uses Xavier init. If you set it to False, in the nin section it will use the default init by net.initialize()
before invoking train_ch6. They are different.
When our team trained the NiN network in chapter 7 more than one time, we got different learning curves: First time:
Second time:
Maybe there is a problem with the initialization? Should the
random_seed
be set or what can we do to solve the problem?