artidoro / qlora

QLoRA: Efficient Finetuning of Quantized LLMs
https://arxiv.org/abs/2305.14314
MIT License
9.96k stars 820 forks source link

extra memory usage for loading the model #269

Open XintianHan opened 1 year ago

XintianHan commented 1 year ago

In Figure 6 of the paper, why are the memory usages for 7B, 13B, 33B, 65B equal to 5046M, 8476M, 19302M, 37074M, instead of 3.5 G, 6.5 G, 16.5G, 32.5G?

I understand that there are some memory for the quantization constants, but I think the gap should not be this large, like for 7B, there is 1.5 G memory gap.