Closed LQchen1 closed 2 weeks ago
Hi @LQchen1. As mentioned in our paper, MESA is a strategy to prevent overfitting. For more information, please refer to the paper: Sharpness-aware training for free.
Hello, @tian-qing001 . Thanks for your answer. I went to the paper and code of MESA and found that your implementation is different from the one implemented by the authors of MESA. You replaced the KL loss in the original paper with the classification loss, Could you please explain your implementation to me a little bit? It looks like your implementation is much simpler. Vanilla MESA: Your MESA: In addition, If I want to change it to SAF in mlla, Just change ema_output in your code to torch.tensor(train_logits[indices,(epoch-args.minus_epoch) % (args.minus_epoch+1)]).to(target.device), I see the main difference between SAF and MESA in the original paper.
Hi @LQchen1~ In the implementation of pytorch, $\rm{KLDiv}(P, Q)=\rm{CrossEntropy}(P, Q) - \rm{Entropy}(Q)$. Therefore, when $P$ is the model's output, minimizing $\rm{KLDiv}(P, Q)$ is equivalent to minimizing $\rm{CrossEntropy}(P, Q)$. As a result, our implementation is the same as the official one with args.temperature=1.
@tian-qing001 In the original paper, it is added when the epoch is in [5-300], but in your code, the hyperparameter mesa is 1 when the epoch is greater than 75. This is the same as the authors here, only epoch_start is set differently. But the strategy in the original paper is not to use mesa if epoch is less than epoch_start, but what you do is set mesa to -1 and subtract it, can you explain here?
See line 264 "if mesa > 0" and line 273 "else:". What do you think is different? I can't understand your problem.
sorry, you are right, Thank you for your patience in answering my questions. May kind people be blessed with life-long peace.
Hello, I noticed the MESA hyperparameter in the Augmentation settings. What is the purpose of this hyperparameter?