artidoro / qlora

QLoRA: Efficient Finetuning of Quantized LLMs
https://arxiv.org/abs/2305.14314
MIT License
10.07k stars 822 forks source link

Failed to load bloomz 7Bmt #8

Open ouwei2013 opened 1 year ago

ouwei2013 commented 1 year ago

Hi thank you very much for this great library! I am really excited about it!

I tried to fine tune bloomz 7B with the 4 bit lora on alpaca by running: python qlora.py --model_name_path my_bloomz_path. However, the job was killed halfway when it was loading the base model. My server has 40G+RAM and a 3090 gpu, which is large enough to load a 7B model. It seems that some process amid the 4-bit quantization drains all the memory.Can you please give me some advice about how I can reduce the memory footprint in order to load the base model? Thank you very much !

ouwei2013 commented 1 year ago

Hello I already solved the problem by setting a lower kill score for the process: echo -17 /proc/myprocessid/oom_score_adj . The cpu usage is 400% when loading the model and it gets killed.

Now it works as a charm!

KimJaehee0725 commented 1 year ago

hello! I want to fine tune bloom 7B, which has exact same structure as bloomz. But in my case, the model takes about 11GB with 4bit quantization. I think it is not properly quantized because in the paper 7b model will take only 5GB.

could you share your case? how much vram bloomz 7b takes?

ouwei2013 commented 1 year ago

Hi do you set torch_dtype = tf.float16 or bfloat16 when you were loading the model? If yes then I think there is no problem with your code. Bloom may take up more vram because it may have a large token embedding matrix to support multi-lingual capability, compared with the English only focused llama model

김재희(Kim Jaehee) @.***>于2023年7月12日 周三22:10写道:

hello! I want to fine tune bloom 7B, which has exact same structure as bloomz. But in my case, the model takes about 11GB with 4bit quantization. I think it is not properly quantized because in the paper 7b model will take only 5GB.

could you share your case? how much vram bloomz 7b takes?

— Reply to this email directly, view it on GitHub https://github.com/artidoro/qlora/issues/8#issuecomment-1632603204, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABN7Y6GPGPUKSBIWA6VJS2DXP2V6HANCNFSM6AAAAAAYNTY7WU . You are receiving this because you authored the thread.Message ID: @.***>

KimJaehee0725 commented 1 year ago

Thanks for your reply! I usedtype=bfloat16and double quantization as QLoRA proposed.

I think the memory usage reported in QLoRA(5GB for 7B model) is only quantized model's load. Below is my obersvations.

all experiments use bloom-7b1, 4bit double quantization with nf4, bfloat16 dtype. (1) Quantized Model Only

(2) Quantized Model and Copied Weight for Mixed Precision

So, the lowest memory requirement for training 7B model is more than just 8GB vram(which is general vram spec for commercial gpus).

+) 7B train with QLoRA with 4 batch size consumes about 24G