UKPLab / sentence-transformers

State-of-the-Art Text Embeddings
https://www.sbert.net
Apache License 2.0
15.45k stars 2.5k forks source link

Support for Quantization Aware Training #3031

Open lejinvarghese opened 3 weeks ago

lejinvarghese commented 3 weeks ago

Is it possible to perform Quantization Aware Training on Sentence Transformers, beyond fp16 and bf16 that are directly supported in the transformer training_args? Are there other options for doing binary quantization during training, other than using the Intel/Neural Compressor INCTrainer or OpenVino OVTrainer?

tomaarsen commented 3 weeks ago

Hello!

I'm afraid the INCTrainer/OVTrainer/ORTTrainer aren't directly compatible with Sentence Transformers. Beyond that, you can load models in specific quantization with bitsandbytes via model_kwargs, and you can use PEFT as well (though that doesn't do quantization per se).

I do want to preface that there's a difference between quantization from Speeding up Inference and Embedding Quantization. The former allows for faster inference, and the latter is a post-processing of the output embeddings so that downstream tasks (e.g. retrieval) are faster.

So there's a difference between:

For the first one, there's not great options out of the box to my knowledge, and for the latter you can consider the Binary Passage Retrieval Loss (bpr_loss) which is compatible with Sentence Transformers.