Open universewill opened 1 year ago
Unfortunately, 12GB is not enough to finetune the 3B model in the standard way (tuning all parameters). That is because of optimizer variables and values for gradient accumulation. This Hugging Face blog post briefly describes how much each of those parts contributes to VRAM usage. For our model, we have used a single A100 80GB GPU and usage metrics show that > 70GB of the GPU memory was allocated.
How much vram needed to finetune 3b model? Is 12gb enough?