learning rate & weight decay

PRBonn / phenobench-baselines

Baselines of the PhenoBench Dataset

https://www.phenobench.org

22 stars 4 forks source link

learning rate & weight decay #6

Closed chenfh21 closed 2 months ago

chenfh21 commented 2 months ago

Hi, I am puzzled why the learning rate decay strategy during optimization in semantic segmentation benchmark methods is like this: "During optimization, we employ Adam [9] and set the weight decay to 2 · 10−4. At the initial 16 epochs, we linearly increase the learning rate to 1 · 10−4 and subsequently apply a polynomial learning rate decay." refer to 'Phenobench Supplementary Material' Could you give me some thoughts about this? lr_image

jbehley commented 2 months ago

I think that the learning rate is increased from 0 to $1.0\cdot 10^{-4}$ for warm start and then decreased. I think the increase in the beginning is to avoid too noisy gradients in the beginning and then decrease the learning rate over time to converge to a minimum.

Does this answer you question? Maybe @JaWeyl can further provide some insights as he was the main developer of the baseline.

chenfh21 commented 2 months ago

I think that the learning rate is increased from 0 to 1.0⋅10−4 for warm start and then decreased. I think the increase in the beginning is to avoid too noisy gradients in the beginning and then decrease the learning rate over time to converge to a minimum.

Does this answer you question? Maybe @JaWeyl can further provide some insights as he was the main developer of the baseline.

Thanks for your reply. I can understand this like yours. Maybe I need to ask @JaWeyl for further clarification.

JaWeyl commented 2 months ago

Hi @chenfh21,

The explanation of @jbehley is correct and provides an intuition about why we use the learning rate schedule.

chenfh21 commented 2 months ago

Hi @chenfh21,

The explanation of @jbehley is correct and provides an intuition about why we use the learning rate schedule.

Thanks for your reply. I have another question. I'm curious whether the baseline method provided in Phenobench's paper utilizes transfer learning/pre-trained weights.

jbehley commented 2 months ago

When we used pre-trained weights, we mention it in the supplement of the paper (see PDF for the last version of the article with supplement included.), e.g., panoptic segmentation, leaf instance segmentation, uses pre-trained weights from ImageNet, but here we train from scratch for semantic segmentation.

chenfh21 commented 2 months ago

When we used pre-trained weights, we mention it in the supplement of the paper (see PDF for the last version of the article with supplement included.), e.g., panoptic segmentation, leaf instance segmentation, uses pre-trained weights from ImageNet, but here we train from scratch for semantic segmentation.

Thanks for your reply very much.