Open jaywongs opened 5 months ago
I recall that you may be able to with deepspeed 3 and cpu offload
I recall that you may be able to with deepspeed 3 and cpu offload
Apologies for the confusion. I attempted to use deepspeed 3 with CPU offload, but the insufficient CPU memory caused issues. The node with 8*A100 has a total of 1024GB of CPU memory.
Have you already tried reducing the batch size and use 8bit optim?
The batch size set to 1 is not working. I haven't tried the 8-bit optimization. Will using 8-bit affect the quality of the trained model?
yeah, 8bit optimizers work well with deepspeed for finetuning.
What piece of documentation is affected?
I couldn't find any documentation related to this. Can anyone tell me if it's possible?
What part(s) of the article would you like to see updated?
I couldn't find any documentation related to this
Additional Information
No response
Acknowledgements