LeapLabTHU / MLLA

Official repository of MLLA
127 stars 4 forks source link

Unfair comparison in table3 of MLLA paper. #15

Closed wullia closed 4 days ago

wullia commented 1 week ago

The paper utilizes MESA as the optimizer, which appears to contribute to accuracy gains compared to AdamW or SGD. It's noted that other methods listed in Table 3 do not use MESA and they may also suffer from overfitting. Could you provide the performance metrics of MLLA when using AdamW as the optimizer? Additionally, does MLLA maintain its superiority over other models like NAT when AdamW is employed? I look forward to your response.

Hi @LQchen1. As mentioned in our paper, MESA is a strategy to prevent overfitting. For more information, please refer to the paper: Sharpness-aware training for free.

LQchen1 commented 5 days ago

@wullia I replicated the authors' training results. With MESA, the accuracy was 83.4%, similar to the paper's results, but when MESA was not used, imagenet-1k's accuracy dropped by ~ 0.4. The configuration I used is mlla_t.yaml。

wullia commented 5 days ago

@wullia I replicated the authors' training results. With MESA, the accuracy was 83.4%, similar to the paper's results, but when MESA was not used, imagenet-1k's accuracy dropped by ~ 0.4. The configuration I used is mlla_t.yaml。

Thank you for sharing these valuable results. They help clarify my concerns regarding the performance impact of using MESA.

wullia commented 5 days ago

@LQchen1 By the way, have you encountered this issue https://github.com/LeapLabTHU/MLLA/issues/18#issue-2382331457 ?

tian-qing001 commented 4 days ago

Hi @wullia @LQchen1. MESA is a strategy to prevent overfitting. When it is removed, the drop path rate MODEL.DROP_PATH_RATE should be enlarged to achieve comparable results.

wullia commented 4 days ago

Could you kindly provide configs without MESA to achieve comparable results? @tian-qing001