why use deepspeed zero2 for pretrain but use zero3 for finetune?

BAAI-DCAI / Bunny

A family of lightweight multimodal models.

Apache License 2.0

799 stars 61 forks source link

Closed double-fire-0 closed 1 month ago

double-fire-0 commented 1 month ago

Are there any other considerations besides saving GPU memory?

Isaachhh commented 1 month ago

It's from LLaVA.

BTW, the pretrain stage of Bunny-v1.1-Llama-3-8B-V is under zero3.

double-fire-0 commented 1 month ago

thx