Initial Loss problem When Fine-Tuning TinyChart-3B-768 with TinyChartData

I encountered an issue while fine-tuning the TinyChart-3B-768 model using the TinyChartData dataset. The initial loss is unexpectedly high, reaching 7.6. Additionally, when using the original DeepSpeed script zero3_offload_decay.json, the loss remains constant at 0 throughout the training process. I changed the version of deepspeed based on pyproject.toml, based on the llava-v1.5 environment; ran vit_add_tome.py against TinyChart-3B-768-siglip.

Are there any dependencies or configurations that I might be missing which could cause the initial loss to be so high? Why does the loss remain at 0 when using the original DeepSpeed script?

I would appreciate any guidance or insights into potential dependency issues or misconfigurations that could lead to these problems.

Thank you for your assistance.

X-PLUG / mPLUG-DocOwl

Initial Loss problem When Fine-Tuning TinyChart-3B-768 with TinyChartData #92