allenai / document-qa

Apache License 2.0
434 stars 122 forks source link

About the learning rate #22

Closed YeDeming closed 6 years ago

YeDeming commented 6 years ago

Hi Christopher,

I feel strange about the high learning rate in ablate_triviaqa.py. I find the learning rate is still 1 and it won't decay in training process.

I am confused about why do you use the high learning rate, and why do you use so many epochs(71 epochs) in sigmoid convince method with the high learning rate.

Does the lower learning rate,likes 0.01, work?

Thanks a lot for your time! YeDeming

chrisc36 commented 6 years ago

Using a learning of 1.0 is standard practice for the Adadelta optimizer, which dynamically adjusts the learning rate anyway. I suspect using different learning rates will not make a large difference.

We use many epochs for the sigmoid method because it took a long time to converge in my experiments.

YeDeming commented 6 years ago

Thanks a lot for your help!