Support for Quantization Aware Training

Hello!

I'm afraid the INCTrainer/OVTrainer/ORTTrainer aren't directly compatible with Sentence Transformers. Beyond that, you can load models in specific quantization with bitsandbytes via model_kwargs, and you can use PEFT as well (though that doesn't do quantization per se).

I do want to preface that there's a difference between quantization from Speeding up Inference and Embedding Quantization. The former allows for faster inference, and the latter is a post-processing of the output embeddings so that downstream tasks (e.g. retrieval) are faster.

So there's a difference between:

Quantization-aware training, i.e. training such that the model has minimal performance loss when quantizing the model weights for faster inference.
Quantization-aware training, i.e. training such that the model has minimal performance loss when quantizing the output embeddings for faster downstream tasks.

For the first one, there's not great options out of the box to my knowledge, and for the latter you can consider the Binary Passage Retrieval Loss (bpr_loss) which is compatible with Sentence Transformers.

Tom Aarsen

UKPLab / sentence-transformers

Support for Quantization Aware Training #3031