SHI-Labs / CuMo

CuMo: Scaling Multimodal LLM with Co-Upcycled Mixture-of-Experts
Apache License 2.0
117 stars 8 forks source link

Train the model without Deepspeed ZeRO #5

Closed MOSHIIUR closed 1 month ago

MOSHIIUR commented 1 month ago

Hi, As you have mentioned how we can use Deepspeed multi-node trainings to train the model on multiple nodes. I was curious if we can train the model without incorporating them at all?

https://github.com/SHI-Labs/CuMo/blob/main/docs/getting_started.md image

chrisjuniorli commented 1 month ago

Hi, you mean training the model with another multi-node framework or without deepspeed at all even for single-node

MOSHIIUR commented 1 month ago

Without deepspeed at all even for single-node.

chrisjuniorli commented 1 month ago

for now we've only tried training the model with deepspeed. If you prefer to use other frameworks or vanilla pytorch, you may explore other llava implementations without deepspeed, like https://github.com/alibaba/Pai-Megatron-Patch, which should be compatible with CuMo.