Learning rate & its scheduling

LeeDoYup commented 2 years ago

I cannot find the specific value of learning rate and how the author schedule to change the learning rate over epochs. How do you implement and reproduce the results in the paper?

dome272 commented 2 years ago

To be honest, I was also a bit disappointed that there were no implementation details given by the authors. So for, I did not do any hyperparameter tuning and just set the learning rate to be the same as used in the original VQGAN paper for the transformer part. That also means that I didn't experiment with any lr-schedules so feel free to experiment with it I would say. And for the last question, I did not reproduce the results yet and they are still sitting on the TODO list. For now all I did was implementing the Bidirectional Transformer into the VQGAN code. Furthermore, the authors refered to two papers when talking about the Bidirectional Transformer which might lead to some serious performance improvement. I simply took my own standard implementation of BERT. So again, feel free to do some optimization there.

LeeDoYup commented 2 years ago

Thanks for the reply. I agree with you that the paper does not include the implementation details.

dome272 / MaskGIT-pytorch

Learning rate & its scheduling #2