Closed denwong47 closed 3 months ago
So there is certainly an argument to say that batching naturally do not go well with quantized models, and we may just disable it when dynamic quantization is in use.
We can do that I think if that would be a simpler, straightforward solution.
I also see the individual values are not way off. Still you're right. They're different.
We can do that I think if that would be a simpler, straightforward solution.
I'll do a quick PR after work.
I also see the individual values are not way off. Still you're right. They're different.
Due to the values depending on the data range of the batch, one can possibly craft 2 polar opposite strings that gives completely different results when embedded individually and batched.
:tada: This issue has been resolved in version 4.0.0 :tada:
The release is available on:
v4.0.0
Your semantic-release bot :package::rocket:
Symptoms
The 6 quantized models in this crate
(basically any
ModelInfo { model_file: "model_quantized.onnx" }
)will produce different embeddings as the batch size changes.
To Reproduce
Run
The test should pass.
Change batch size in the unit test to
Some(3)
.Assertions should fail on the above 6 models:
Cause
This is a known behaviour of Quantized model due to dynamic quantization:
This means that the data range will be observed within each batch. This makes the embeddings generated incomparable across batches.
Proposed Solution
I am currently researching this to see if we have any ways of defining the data range independent of the input data. I will also need to look deeper into how other packages deal with this.
However it is worth noting that even if we solve this issue, fundamentally any embeddings generated for a set of documents will not be comparable to embeddings for another set of documents. Embeddings from a quantized model is only of meaning when used within itself. So there is certainly an argument to say that batching naturally do not go well with quantized models, and we may just disable it when dynamic quantization is in use.