Anush008 / fastembed-rs

Library for generating vector embeddings, reranking in Rust
https://docs.rs/fastembed
Apache License 2.0
264 stars 36 forks source link

fix!: Throws an informative error upon inappropriate batch size with Dynamic Quantized models #109

Closed denwong47 closed 2 months ago

denwong47 commented 2 months ago

Motivation

Closes #107, at least as the short term solution.

EmbeddingModel now has a get_quantization_mode method which will return static values for the models behaviour. Not all quantized models are affected, thus we need to tell apart the non-quantized, static quantized and dynamic quantized.

Batch size will then be checked upon transform to see if the batch size is appropriate; if not, returns an Err stating the reason for it.

Also contains a minor refactor of text_embedding.rs, bringing the models_list into static scope, not requiring repeated instantiation every time models_list is called. This is done via std::sync::OnceLock. Also provides a convenient function get_model_info, which is an O(1) lookup with no memory cost to get the correct model if exists.

Test Plan

cargo test --features=optimum-cli.

The test_embeddings test had been split into two via a macro_rules, one for None batch size and the other for Some(3). Internally test_embeddings will check if the batch size is appropriate, and expects an Err instead. For non-quantized and static quantized models, the pre-calculated embeddings sum still need to be satisfied with or without batch size.

Breaking changes

quantization parameter had been added to UserDefinedEmbeddingModel, which cannot otherwise be inferred.

github-actions[bot] commented 1 month ago

:tada: This issue has been resolved in version 4.0.0 :tada:

The release is available on:

Your semantic-release bot :package::rocket: