codefuse-ai / ModelCache

A LLM semantic caching system aiming to enhance user experience by reducing response time via cached query-result pairs.
Other
780 stars 40 forks source link

feat: support huggingface/text-embeddings-inference for faster embedding inference #39

Closed liwenshipro closed 1 week ago

liwenshipro commented 4 months ago

Text Embeddings Inference (TEI) is a toolkit for deploying and serving open source text embeddings and sequence classification models. TEI enables high-performance extraction for the most popular models, including FlagEmbedding, Ember, GTE and E5. TEI implements many features such as:

This PR support TEI faster embedding inference with modelcache, the speedup is shown as follows: image

peng3307165 commented 4 months ago

Thank you for participating in the ModelCache open-source project; we welcome your involvement, and the addition of huggingface/text-embeddings-inference is a good idea. We offer two suggestions regarding your submission:

1 Using TextEmbeddingsInference as a class name and text_embeddings_inference as a variable name for LazyImport is somewhat generic, users may confuse concepts. It is recommended that names with greater distinction, such as HuggingfaceTEI or Huggingface_TEI, be used to enhance recognizability  2 Given the use of URL requests, it is recommended to add an example to the examples/embedding directory. I have already added the relevant directory, and you can pull the latest main branch to obtain it.

peng3307165 commented 1 week ago

We have merged your commit into the main branch. Thank you for your contributions to the ModelCache project. Best wishes!