[FSDP] Enable loading prequantized weights with bf16/fp16/fp32 quant_storage

bitsandbytes-foundation / bitsandbytes

Accessible large language models via k-bit quantization for PyTorch.

https://huggingface.co/docs/bitsandbytes/main/en/index

MIT License

6.14k stars 616 forks source link

[FSDP] Enable loading prequantized weights with bf16/fp16/fp32 quant_storage #1295

Closed matthewdouglas closed 2 months ago

matthewdouglas commented 2 months ago

This is a companion PR for https://github.com/huggingface/transformers/pull/32276 to allow us to load prequantized weights with alternate storage. We keep track of metadata we need the same way we would with Params4bit.__new__ after PR #970.

This works with models exported with a non-default quant_storage such as this one in NF4 with BF16 storage.

@Titus-von-Koeller @winglian