We can load models in 4-bit with and without bitsandbytes support. so what is the major difference between both?

pradeepdev-1995 commented 8 months ago

System Info

1)

from transformers import AutoModelForCausalLM
AutoModelForCausalLM.from_pretrained(
    model_path,
    load_in_4bit = True,
    device_map="auto"
)

2) Using bitsandbytes

from transformers import AutoModelForCausalLM, BitsAndBytesConfig
bnb_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_use_double_quant="False",
)

FineTunedModel = AutoModelForCausalLM.from_pretrained(
    model_path,
    quantization_config=bnb_config,
    device_map="auto"
)

Reproduction

We can load models in 4-bit with and without bitsandbytes support. so what is the major difference between both?

1)

from transformers import AutoModelForCausalLM
AutoModelForCausalLM.from_pretrained(
    model_path,
    load_in_4bit = True,
    device_map="auto"
)

2) Using bitsandbytes

from transformers import AutoModelForCausalLM, BitsAndBytesConfig
bnb_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_use_double_quant="False",
)

FineTunedModel = AutoModelForCausalLM.from_pretrained(
    model_path,
    quantization_config=bnb_config,
    device_map="auto"
)

Expected behavior

We can load models in 4-bit with and without bitsandbytes support. so what is the major difference between both?

1)

from transformers import AutoModelForCausalLM
AutoModelForCausalLM.from_pretrained(
    model_path,
    load_in_4bit = True,
    device_map="auto"
)

2) Using bitsandbytes

from transformers import AutoModelForCausalLM, BitsAndBytesConfig
bnb_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_use_double_quant="False",
)

FineTunedModel = AutoModelForCausalLM.from_pretrained(
    model_path,
    quantization_config=bnb_config,
    device_map="auto"
)

puyuanOT commented 8 months ago

https://huggingface.co/docs/transformers/main_classes/quantization

By default your model is loaded with bnb.

https://github.com/huggingface/transformers/blob/main/src/transformers/modeling_utils.py#L3609

pradeepdev-1995 commented 8 months ago

@puyuanOT Okay so

from transformers import AutoModelForCausalLM
AutoModelForCausalLM.from_pretrained(
    model_path,
    load_in_4bit = True,
    device_map="auto"
)

in the above case, the model by default uses the bitsandbytes support for the load_in_4bit quantization, even though we didn't specify it externally in the code right?

Titus-von-Koeller commented 8 months ago

Thanks @puyuanOT for chipping in. Also in my understanding this is correct. load_in_4bit can be understood as a shorthand when not needing the extra settings of the BitsAndBytesConfig.

From what I understand, so far it was possible to specify both load_in_4bit and BitsAndBytesConfig which is confusing, but which is being fixed to be XOR with the HF Transformers modeling_utils refactor that should be merged and released any day now.

@younesbelkada Correct me if I'm wrong?

Can we close this issue then?

younesbelkada commented 8 months ago

Hi @pradeepdev-1995 @Titus-von-Koeller true yes, if you don't specify a BitsAndBytesConfig and simply pass load_in_4bit=True the model will be loaded in 4bit with the default bnb config values for 4bit. In the future, load_in_4bit and load_in_8bit will be deprecated in favor of quantization configs.

bitsandbytes-foundation / bitsandbytes