I encountered an issue while fine-tuning the TinyChart-3B-768 model using the TinyChartData dataset. The initial loss is unexpectedly high, reaching 7.6. Additionally, when using the original DeepSpeed script zero3_offload_decay.json, the loss remains constant at 0 throughout the training process.
I changed the version of deepspeed based on pyproject.toml, based on the llava-v1.5 environment; ran vit_add_tome.py against TinyChart-3B-768-siglip.
Are there any dependencies or configurations that I might be missing which could cause the initial loss to be so high?
Why does the loss remain at 0 when using the original DeepSpeed script?
I would appreciate any guidance or insights into potential dependency issues or misconfigurations that could lead to these problems.
I encountered an issue while fine-tuning the TinyChart-3B-768 model using the TinyChartData dataset. The initial loss is unexpectedly high, reaching 7.6. Additionally, when using the original DeepSpeed script
zero3_offload_decay.json
, the loss remains constant at 0 throughout the training process. I changed the version of deepspeed based onpyproject.toml
, based on the llava-v1.5 environment; ranvit_add_tome.py
againstTinyChart-3B-768-siglip
.Are there any dependencies or configurations that I might be missing which could cause the initial loss to be so high? Why does the loss remain at 0 when using the original DeepSpeed script?
I would appreciate any guidance or insights into potential dependency issues or misconfigurations that could lead to these problems.
Thank you for your assistance.