Open dorsa-zeinali opened 4 months ago
We used 80GB A100s. I don't remember the exact requirements, but you should be able to fit more than 2048 tokens with 48GB. You may want to avoid manifesting the entire model (turn off the --train_mode
flag) and also try using activation checkpointing.
Thank you.
I use A100 80GB to finetune_e2e llama2-7b-chat-4bit with ctx_size 4096, but also encouter with oom issue caused by the code below
W_decompressed = quiptools_cuda.decompress_packed_e8p( Qidxs_list[0].view(m // 16, n // 64, 8, 4), self.codebook. grid_packed_abs) + quiptools_cuda.decompress_packed_e8p( Qidxs_list[1].view(m // 16, n // 64, 8, 4), self.codebook.grid_packed_abs) / resid_scale x = (x.to(torch.float16) @ W_decompressed.T).to(torch.float32)
I was wondering why?
hi, what context size and devset size do you think is reasonable for the e2e finetuning step given that I have 1 gpu with 48GB?
The e2e fine tuning script is pretty poor written and is not very memory efficient. All it does is "train" the quantized model by only updating unquantized parameters such as the LM head and layernorms. This means it has to backprop through the entire model (even the quantized parts) and dequantize the weights during the forward and backward pass. I suspect torch autograd is storing W_decompressed
for the backward pass since the actual operation is x@W_decompressed.T
. You can write a custom backward pass that decompresses the weights again, which should save a lot of memory.
Hi, I hope you're doing well. I am a researcher at northeastern university trying to replicate your quantization results for llama-2-7b, and I can do so without finetuning without running out of memory, but I was wondering, for fine-tuning during quantization and also fine-tuning post quantization if one were to do each separately, what is the memory requirement for each step? I cannot do any context size larger than 2048 without running out of memory for the post quantization fine-tuning step, and it says in your paper that you used NVIDIA A100 gpus, but it did not specify how much memory they had (40GB or 80GB). I have access to 4 GPUs, which each have 48GB of memory. I would appreciate any insights you have. Thank you.