Have you tried to train from scratch and use large learning rate?

gaopengcuhk / Stable-Pix2Seq

A full-fledged version of Pix2Seq

Apache License 2.0

235 stars 20 forks source link

Have you tried to train from scratch and use large learning rate? #4

Closed ssssholmes closed 2 years ago

ssssholmes commented 2 years ago

The authors mentioned they use adamw with 0.003 learning rate without imagenet pretrain. I tried it and find it is really hard to train with such config.

gaopengcuhk commented 2 years ago

The largest learning rate for stable training in this codebase is 0.001.

Sharpiless commented 2 years ago

Thanks for your great work. But the paper says:"We use the stochastic depth (Huang et al., 2016) with a rate of 10% to reduce overfitting." Have you tried to use stochastic depth method in your model?

gaopengcuhk commented 2 years ago

I will try stochastic depth in the future.