Closed akshayiyer2610 closed 9 months ago
@akshayiyer2610 In the source code, when picking 4-bit, the parameter count is divided by 2. So this is intended. (I don't know why though, I just remember it behaved that way)
@OneCodeToRuleThemAll Can you point me to the source where its divided by 2? Does that imply the other ~3.5B parameters are frozen during the quantization process?
@OneCodeToRuleThemAll Can you point me to the source where its divided by 2? Does that imply the other ~3.5B parameters are frozen during the quantization process?
@akshayiyer2610 okay so I had to do a little digging to find the code snippet again and from qlora repo
And here is an issue with the same question from that repo (Remains unanswered) https://github.com/artidoro/qlora/issues/260
Thanks for the link to source code @OneCodeToRuleThemAll . Much appreciated.
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.
Bot closed this issue even when it hasn't been resolved. Quantization reduces memory requirements but has no effect on parameter count. I see this as a bu.
After loading the llama2-7b-text model using 4-bit quantization, the total parameter count is reduced to ~3.5B. Is this a bug or the expected behavior.
Packages: bitsandbytes => 0.41.1 transformers => 4.33.2 torch => 2.0.1
Code:
The output of print statement is 3,540,389,888.