Is is possible to train 70b model on 8*A100 80G with full fine tunning?

axolotl-ai-cloud / axolotl

Go ahead and axolotl questions

https://axolotl-ai-cloud.github.io/axolotl/

Apache License 2.0

7.58k stars 822 forks source link

Is is possible to train 70b model on 8*A100 80G with full fine tunning? #1439

Open jaywongs opened 5 months ago

jaywongs commented 5 months ago

What piece of documentation is affected?

I couldn't find any documentation related to this. Can anyone tell me if it's possible?

What part(s) of the article would you like to see updated?

I couldn't find any documentation related to this

Additional Information

No response

Acknowledgements

[X] My issue title is concise, descriptive, and in title casing.
[X] I have searched the existing issues to make sure this feature has not been requested yet.
[X] I have provided enough information for the maintainers to understand and evaluate this request.

NanoCode012 commented 5 months ago

I recall that you may be able to with deepspeed 3 and cpu offload

jaywongs commented 5 months ago

I recall that you may be able to with deepspeed 3 and cpu offload

Apologies for the confusion. I attempted to use deepspeed 3 with CPU offload, but the insufficient CPU memory caused issues. The node with 8*A100 has a total of 1024GB of CPU memory.

NanoCode012 commented 5 months ago

Have you already tried reducing the batch size and use 8bit optim?

jaywongs commented 5 months ago

The batch size set to 1 is not working. I haven't tried the 8-bit optimization. Will using 8-bit affect the quality of the trained model?

winglian commented 5 months ago

yeah, 8bit optimizers work well with deepspeed for finetuning.