Closed wullia closed 4 days ago
@wullia I replicated the authors' training results. With MESA, the accuracy was 83.4%, similar to the paper's results, but when MESA was not used, imagenet-1k's accuracy dropped by ~ 0.4. The configuration I used is mlla_t.yaml。
@wullia I replicated the authors' training results. With MESA, the accuracy was 83.4%, similar to the paper's results, but when MESA was not used, imagenet-1k's accuracy dropped by ~ 0.4. The configuration I used is mlla_t.yaml。
Thank you for sharing these valuable results. They help clarify my concerns regarding the performance impact of using MESA.
@LQchen1 By the way, have you encountered this issue https://github.com/LeapLabTHU/MLLA/issues/18#issue-2382331457 ?
Hi @wullia @LQchen1.
MESA is a strategy to prevent overfitting. When it is removed, the drop path rate MODEL.DROP_PATH_RATE
should be enlarged to achieve comparable results.
Could you kindly provide configs without MESA to achieve comparable results? @tian-qing001
The paper utilizes MESA as the optimizer, which appears to contribute to accuracy gains compared to AdamW or SGD. It's noted that other methods listed in Table 3 do not use MESA and they may also suffer from overfitting. Could you provide the performance metrics of MLLA when using AdamW as the optimizer? Additionally, does MLLA maintain its superiority over other models like NAT when AdamW is employed? I look forward to your response.