Closed ghost closed 3 years ago
Hi,
I notice that you write the comment for the 10 times head learning rate here, but seems you do not perform that. Since training the entire network is really time-consuming, may I kindly ask that whether it is the wrong comment or not.
Cheers.
Hi, actually it is the wrong comment that comes from an old version of code :) We will polish the code. Thank you for pointing out!
Hi,
I notice that you write the comment for the 10 times head learning rate here, but seems you do not perform that. Since training the entire network is really time-consuming, may I kindly ask that whether it is the wrong comment or not.
Cheers.