clovaai / AdamP

AdamP: Slowing Down the Slowdown for Momentum Optimizers on Scale-invariant Weights (ICLR 2021)
https://clovaai.github.io/AdamP/
MIT License
415 stars 52 forks source link

Runtime: Adam vs AdamP #9

Closed adityac8 closed 3 years ago

adityac8 commented 3 years ago

Hi, Thank you for the code release. I am trying to run MIRNet by changing Adam to AdamP. However, the training time per epoch is increased by nearly 2 times. Is there any way to make it faster?

I tried with two environments Python 3.7, Pytorch 1.1, CUDA 9.0 and Python 3.7, Pytorch 1.4, CUDA 10.0 but both give the same speed.

Thanks

bhheo commented 3 years ago

Hi

Thank you for your interest in our work.

In our experiment , we use batch-size 256 and AdamP costs 8% training time than Adam. MIRNet's batch-size is 16, which is a much smaller value than our setting. It seems to be the cause of slow processing. So, I recommend increasing the batch-size to improve the speed. As the batch-size increases, the ratio of the optimizer operation in training decreases, and the training time per epoch can be reduced.

I know that increasing the batch-size is often infeasible and not a good solution. But, I don't have any other solution besides that. AdamP uses additional computation for the projection operation, which is an inevitable factor. There may be ways to further optimize the projection operation, but I couldn't found it.

adityac8 commented 3 years ago

Thank you for a quick response.