Thytu / SMIT

SMIT: A Simple Modality Integration Tool
MIT License
15 stars 3 forks source link

Mitigate catastrophic forgetting #16

Open Thytu opened 5 months ago

Thytu commented 5 months ago

The current workflow leads to a certain amount of catastrophic forgetting, the base model used abacaj/phi-2-super reach an average of $62.13$ on the open_llm_leaderboard while the resulting model Thytu/phi-2-audio-super falls to $35.79$.

Model Average ARC HellaSwag MMLU TruthfulQA Winogrande GSM8K
abacaj/phi-2-super 62.13 61.86 76.6 58.41 48.37 73.01 54.51
Thytu/phi-2-audio-super 35.79 33.96 43.17 28.67 50.91 58.01 0

While some kind of degradation is expected on a 2B parameters model, the resulting model shouldn't reach such a low average.

One interesting result is that when training the model on text-only data (meaning without training it to become multimodal) the Average still drops considerably

Model Average ARC HellaSwag MMLU TruthfulQA Winogrande GSM8K
Multimodal 35.79 33.96 43.17 28.67 50.91 58.01 0
Text only 35.36 35.92 45.33 24.58 46.21 59.98 0.15

This can either means: