LeapLabTHU / MLLA

Official repository of MLLA
127 stars 4 forks source link

about MESA #11

Closed LQchen1 closed 2 weeks ago

LQchen1 commented 2 weeks ago

Hello, I noticed the MESA hyperparameter in the Augmentation settings. What is the purpose of this hyperparameter? image

tian-qing001 commented 2 weeks ago

Hi @LQchen1. As mentioned in our paper, MESA is a strategy to prevent overfitting. For more information, please refer to the paper: Sharpness-aware training for free.

LQchen1 commented 2 weeks ago

Hello, @tian-qing001 . Thanks for your answer. I went to the paper and code of MESA and found that your implementation is different from the one implemented by the authors of MESA. You replaced the KL loss in the original paper with the classification loss, Could you please explain your implementation to me a little bit? It looks like your implementation is much simpler. Vanilla MESA: image Your MESA: image In addition, If I want to change it to SAF in mlla, Just change ema_output in your code to torch.tensor(train_logits[indices,(epoch-args.minus_epoch) % (args.minus_epoch+1)]).to(target.device), I see the main difference between SAF and MESA in the original paper.

tian-qing001 commented 2 weeks ago

Hi @LQchen1~ In the implementation of pytorch, $\rm{KLDiv}(P, Q)=\rm{CrossEntropy}(P, Q) - \rm{Entropy}(Q)$. Therefore, when $P$ is the model's output, minimizing $\rm{KLDiv}(P, Q)$ is equivalent to minimizing $\rm{CrossEntropy}(P, Q)$. As a result, our implementation is the same as the official one with args.temperature=1.

LQchen1 commented 2 weeks ago

@tian-qing001 In the original paper, it is added when the epoch is in [5-300], but in your code, the hyperparameter mesa is 1 when the epoch is greater than 75. This is the same as the authors here, only epoch_start is set differently. But the strategy in the original paper is not to use mesa if epoch is less than epoch_start, but what you do is set mesa to -1 and subtract it, can you explain here?

tian-qing001 commented 2 weeks ago

See line 264 "if mesa > 0" and line 273 "else:". What do you think is different? I can't understand your problem.

LQchen1 commented 2 weeks ago

sorry, you are right, Thank you for your patience in answering my questions. May kind people be blessed with life-long peace.