axolotl-ai-cloud / axolotl

Go ahead and axolotl questions
https://axolotl-ai-cloud.github.io/axolotl/
Apache License 2.0
7.9k stars 869 forks source link

[jamba] Quantizing should exclude `mamba` layers #1498

Closed creatorrr closed 6 months ago

creatorrr commented 7 months ago

Please check that this issue hasn't been reported before.

Expected Behavior

As mentioned in the docs, when quantizing jamba, it is recommended to exclude the "mamba" layer.

You can easily quantize the model to 8-bit using bitsandbytes. In order to not degrade model quality, we recommend to exclude the Mamba blocks from the quantization:

from transformers import AutoModelForCausalLM, BitsAndBytesConfig
quantization_config = BitsAndBytesConfig(load_in_8bit=True,
llm_int8_skip_modules=["mamba"])
model = AutoModelForCausalLM.from_pretrained("ai21labs/Jamba-v0.1",
trust_remote_code=True,
torch_dtype=torch.bfloat16,
attn_implementation="flash_attention_2",
quantization_config=quantization_config)

Current behaviour

The entire model is automatically quantized.

Steps to reproduce

...

Config yaml

No response

Possible solution

No response

Which Operating Systems are you using?

Python Version

3.10

axolotl branch-commit

main

Acknowledgements

NanoCode012 commented 7 months ago

Hey, thanks for reporting. Would you be interested in making a PR for this?

creatorrr commented 6 months ago

For sure :)

creatorrr commented 6 months ago

thanks for getting to it before me @NanoCode012 :pray: