Closed pradeepdev-1995 closed 8 months ago
https://huggingface.co/docs/transformers/main_classes/quantization
By default your model is loaded with bnb.
https://github.com/huggingface/transformers/blob/main/src/transformers/modeling_utils.py#L3609
@puyuanOT Okay so
from transformers import AutoModelForCausalLM
AutoModelForCausalLM.from_pretrained(
model_path,
load_in_4bit = True,
device_map="auto"
)
in the above case, the model by default uses the bitsandbytes support for the load_in_4bit quantization, even though we didn't specify it externally in the code right?
Thanks @puyuanOT for chipping in. Also in my understanding this is correct. load_in_4bit can be understood as a shorthand when not needing the extra settings of the BitsAndBytesConfig.
From what I understand, so far it was possible to specify both load_in_4bit
and BitsAndBytesConfig
which is confusing, but which is being fixed to be XOR with the HF Transformers modeling_utils refactor that should be merged and released any day now.
@younesbelkada Correct me if I'm wrong?
Can we close this issue then?
Hi @pradeepdev-1995 @Titus-von-Koeller true yes, if you don't specify a BitsAndBytesConfig
and simply pass load_in_4bit=True
the model will be loaded in 4bit with the default bnb config values for 4bit. In the future, load_in_4bit
and load_in_8bit
will be deprecated in favor of quantization configs.
System Info
We can load models in 4-bit with and without bitsandbytes support. so what is the major difference between both?
1)
2) Using bitsandbytes
Reproduction
We can load models in 4-bit with and without bitsandbytes support. so what is the major difference between both?
1)
2) Using bitsandbytes
Expected behavior
We can load models in 4-bit with and without bitsandbytes support. so what is the major difference between both?
1)
2) Using bitsandbytes