EmbeddingModel now has a get_quantization_mode method which will return static values for the models behaviour. Not all quantized models are affected, thus we need to tell apart the non-quantized, static quantized and dynamic quantized.
Batch size will then be checked upon transform to see if the batch size is appropriate; if not, returns an Err stating the reason for it.
Also contains a minor refactor of text_embedding.rs, bringing the models_list into static scope, not requiring repeated instantiation every time models_list is called. This is done via std::sync::OnceLock. Also provides a convenient function get_model_info, which is an O(1) lookup with no memory cost to get the correct model if exists.
Test Plan
cargo test --features=optimum-cli.
The test_embeddings test had been split into two via a macro_rules, one for None batch size and the other for Some(3). Internally test_embeddings will check if the batch size is appropriate, and expects an Err instead. For non-quantized and static quantized models, the pre-calculated embeddings sum still need to be satisfied with or without batch size.
Breaking changes
quantization parameter had been added to UserDefinedEmbeddingModel, which cannot otherwise be inferred.
Motivation
Closes #107, at least as the short term solution.
EmbeddingModel
now has aget_quantization_mode
method which will return static values for the models behaviour. Not all quantized models are affected, thus we need to tell apart the non-quantized, static quantized and dynamic quantized.Batch size will then be checked upon
transform
to see if the batch size is appropriate; if not, returns anErr
stating the reason for it.Also contains a minor refactor of
text_embedding.rs
, bringing themodels_list
into static scope, not requiring repeated instantiation every timemodels_list
is called. This is done viastd::sync::OnceLock
. Also provides a convenient functionget_model_info
, which is anO(1)
lookup with no memory cost to get the correct model if exists.Test Plan
cargo test --features=optimum-cli
.The
test_embeddings
test had been split into two via amacro_rules
, one forNone
batch size and the other forSome(3)
. Internallytest_embeddings
will check if the batch size is appropriate, and expects anErr
instead. For non-quantized and static quantized models, the pre-calculated embeddings sum still need to be satisfied with or without batch size.Breaking changes
quantization
parameter had been added toUserDefinedEmbeddingModel
, which cannot otherwise be inferred.