Open vashiegaran opened 10 months ago
Thanks for your interest for LMFlow! For full fine-tuning of llama2-7b, at least a single GPU of 3090 (GPU memory 24G) is required. Also, a RAM of model-size * 16
G is needed for offloading, e.g. 112G RAM for 7b models. The RAM consumption will be halved when tuning a 3b model instead. Hope that answers your question 😄
@research4pan Thank you for your work. I would like to ask, if I use the Text2Text data for Finetuning (Full) according to the script, will full finetuning only focus on generating from input to output, or will it also learn about the internal grammar knowledge of the input? Or what dataset or parameter settings should be used to achieve this?
Thanks for your interest in LMFlow! If you are using text2text, then the input context will not be counted towards the loss, i.e. it only focuses on generating the output, and will not learn how to generate input.
You may use "text2text"
(https://optimalscale.github.io/LMFlow/examples/DATASETS.html#text2text) or "conversation"
(https://optimalscale.github.io/LMFlow/examples/DATASETS.html#conversation) formats supported in LMFlow (to achieve this. Thanks 😄
@research4pan Thank you for the response. I may not have expressed myself clearly. I hope to let the model learn all textual grammar knowledge, not just from input to output. How should I do full finetuning? Additionally, should tokenization use the original model, and are there corresponding parameters that can be modified? For example, if I follow the full finetuning of GPT-2 as in the example, can I use the tokenizer of BERT-Chinese? Lastly, could you please provide me with the latest QR code for the WeChat group? I've tried every QR code, but they are all from last October and have expired. Thank you very much!
What is the minimum requirement in order to fine tune small model like openlm-research/open_llama_3b and big model like llama2-7b