fix!: Throws an informative error upon inappropriate batch size with Dynamic Quantized models

Motivation

Closes #107, at least as the short term solution.

EmbeddingModel now has a get_quantization_mode method which will return static values for the models behaviour. Not all quantized models are affected, thus we need to tell apart the non-quantized, static quantized and dynamic quantized.

Batch size will then be checked upon transform to see if the batch size is appropriate; if not, returns an Err stating the reason for it.

Also contains a minor refactor of text_embedding.rs, bringing the models_list into static scope, not requiring repeated instantiation every time models_list is called. This is done via std::sync::OnceLock. Also provides a convenient function get_model_info, which is an O(1) lookup with no memory cost to get the correct model if exists.

Test Plan

cargo test --features=optimum-cli.

The test_embeddings test had been split into two via a macro_rules, one for None batch size and the other for Some(3). Internally test_embeddings will check if the batch size is appropriate, and expects an Err instead. For non-quantized and static quantized models, the pre-calculated embeddings sum still need to be satisfied with or without batch size.

Breaking changes

quantization parameter had been added to UserDefinedEmbeddingModel, which cannot otherwise be inferred.

Anush008 / fastembed-rs