Minimum Requirement of GPU for Fine Tuning

OptimalScale / LMFlow

An Extensible Toolkit for Finetuning and Inference of Large Foundation Models. Large Models for All.

https://optimalscale.github.io/LMFlow/

Apache License 2.0

8.26k stars 826 forks source link

Minimum Requirement of GPU for Fine Tuning #687

Open vashiegaran opened 10 months ago

vashiegaran commented 10 months ago

What is the minimum requirement in order to fine tune small model like openlm-research/open_llama_3b and big model like llama2-7b

research4pan commented 10 months ago

Thanks for your interest for LMFlow! For full fine-tuning of llama2-7b, at least a single GPU of 3090 (GPU memory 24G) is required. Also, a RAM of model-size * 16G is needed for offloading, e.g. 112G RAM for 7b models. The RAM consumption will be halved when tuning a 3b model instead. Hope that answers your question 😄

xigua314 commented 6 months ago

@research4pan Thank you for your work. I would like to ask, if I use the Text2Text data for Finetuning (Full) according to the script, will full finetuning only focus on generating from input to output, or will it also learn about the internal grammar knowledge of the input? Or what dataset or parameter settings should be used to achieve this?

research4pan commented 6 months ago

Thanks for your interest in LMFlow! If you are using text2text, then the input context will not be counted towards the loss, i.e. it only focuses on generating the output, and will not learn how to generate input.

You may use "text2text" (https://optimalscale.github.io/LMFlow/examples/DATASETS.html#text2text) or "conversation" (https://optimalscale.github.io/LMFlow/examples/DATASETS.html#conversation) formats supported in LMFlow (to achieve this. Thanks 😄

xigua314 commented 6 months ago

@research4pan Thank you for the response. I may not have expressed myself clearly. I hope to let the model learn all textual grammar knowledge, not just from input to output. How should I do full finetuning? Additionally, should tokenization use the original model, and are there corresponding parameters that can be modified? For example, if I follow the full finetuning of GPT-2 as in the example, can I use the tokenizer of BERT-Chinese? Lastly, could you please provide me with the latest QR code for the WeChat group? I've tried every QR code, but they are all from last October and have expired. Thank you very much！