Closed hsannn closed 8 months ago
Hi @hsannn, sorry for the late reply. We used EMA only on the IMBD dataset primarily because: We noticed that performance on the IMDB dataset varied significantly with different runs. EMA often results in more stable results. Therefore, we explored the use of EMA on the IMDB dataset. However, even after employing EMA, we noticed the performance variance across different runs is still quite high compared to performance variance on other datasets even without EMA. We didn't use EMA on other datasets because the performance variance on them is small without EMA. But you can also explore EMA on other datasets, which might help you further boost the performance. Hope this information can help~
Thank you for your thoughtful response to my previous question!
I have another question: Can you explain why the IMDB dataset is the only one with ema_mode set to True?