Closed YeDeming closed 6 years ago
Using a learning of 1.0 is standard practice for the Adadelta optimizer, which dynamically adjusts the learning rate anyway. I suspect using different learning rates will not make a large difference.
We use many epochs for the sigmoid method because it took a long time to converge in my experiments.
Thanks a lot for your help!
Hi Christopher,
I feel strange about the high learning rate in ablate_triviaqa.py. I find the learning rate is still 1 and it won't decay in training process.
I am confused about why do you use the high learning rate, and why do you use so many epochs(71 epochs) in sigmoid convince method with the high learning rate.
Does the lower learning rate,likes 0.01, work?
Thanks a lot for your time! YeDeming