General Rule of Thumb when Estimating GPU Memory Requirements

databrickslabs / dolly

Databricks’ Dolly, a large language model trained on the Databricks Machine Learning Platform

https://www.databricks.com/blog/2023/03/24/hello-dolly-democratizing-magic-chatgpt-open-models.html

Apache License 2.0

10.81k stars 1.16k forks source link

General Rule of Thumb when Estimating GPU Memory Requirements #176

Closed Hegelim closed 1 year ago

Hegelim commented 1 year ago

What's the general rule of thumb when it comes to estimating how much GPU memory I would need?

In terms of training?
In terms of fine-tuning?
In terms of inferencing?

And how would the rough estimate of 2x number of parameters fit in here? Does it refer to simply loading the model?

srowen commented 1 year ago

For this model? What size? that of course matters a lot.

You would not pre-train. But it'd be the same answer for fine-tuning. And there really isn't a single answer, because you can trade off memory for speed in many ways. GPUs like the A100 are ideal, as they require little tradeoff. A10 and V100 are viable (see README) for the smaller 3B/7B, but significantly slower. I don't think anything with 16GB is viable.

For generation, the rule of thumb is of course that the model needs 2x parameters in bytes because you will load in 16-bit. You can load in 8-bit for half the mem of course, at some cost to accuracy. And you need enough room for your input which depends on input. So again no one answer. A10 is possible for 12B but not ideal; A100 is ideal. For 7B/3B, A10 is fine and T4/V100 are possible, less ideal.

Hegelim commented 1 year ago

Thanks @srowen for your detailed answer. I have some follow-up questions:

Why is it recommended not to pre-train/fine-tune? Is it because it would require a lot of memory?
Is it generally true that fine-tuning would require a bit more memory than generation?
It totally makes sense that there is not a single answer as this heavily depends on the model/input, etc. I am just wondering if there is some way to have a rough idea of how much GPU memory is required before I even try the model in terms of fine-tuning? Which would be really helpful as there are so many LLM models out there and knowing which GPU/server I need for this model can help me estimate the cost beforehand.

srowen commented 1 year ago

Pre-training is not recommended. This is the kind of thing that can cost a million bucks. Fine-tuning could be fine. I'm saying the task is the same, so hardware considerations are the same.

Fine-tuning is just fairly different. It will in general need more memory. It also offers more possibilities to trade off mem for speed as you can afford latency. See: deepspeed

The answer depends on the model, your input, and how you train. the answer is quite different for 3B vs 12B here, even. I would tell you, as above and in the README, that A100s are ideal for training (40GB). Anything else will require tradeoffs to work that will cost more and take a lot longer. This is for fine-tuning. For 12B, you could expect to spend thousands of dollars, probably, if that helps. Not $100 and not $10000, probably.

Hegelim commented 1 year ago

Thank you so much for your reply. I will close this thread now.