aurelio-labs / semantic-router

Superfast AI decision making and intelligent processing of multi-modal data.
https://www.aurelio.ai/semantic-router
MIT License
1.88k stars 193 forks source link

Support for Infinity as Encoder #215

Open michaelfeil opened 5 months ago

michaelfeil commented 5 months ago

Running a community project under https://github.com/michaelfeil/infinity - this should help with encoding.

Motivation:

Questions:

import asyncio
from infinity_emb import AsyncEmbeddingEngine, EngineArgs
query = "What is the python package infinity_emb?"
docs = ["This is a document not related to the python package infinity_emb, hence...", 
    "Paris is in France!",
    "infinity_emb is a package for sentence embeddings and rerankings using transformer models in Python!"]
engine_args = EngineArgs(model_name_or_path = "BAAI/bge-reranker-base", engine="torch")

engine = AsyncEmbeddingEngine.from_args(engine_args)
async def main(): 
    async with engine:
        ranking, usage = await engine.rerank(query=query, docs=docs)
        print(list(zip(ranking, docs)))
asyncio.run(main())
therahulparmar commented 4 months ago

@jamescalam please add the support of the Infinity Embeddings.

jamescalam commented 4 months ago

hi @therahulparmar and @michaelfeil — we're able to accept PRs for the infinity encoder, the library itself should be added as an optional dependency (can see huggingface, voyage, etc encoders as examples here)

michaelfeil commented 4 months ago

@jamescalam Does call support async calls?

jamescalam commented 4 months ago

@michaelfeil not right now, we can add support though - is it required for infinity?

michaelfeil commented 4 months ago

Yeah, the batching happens with multiple async request at once. This is also used when the batch size is larger than what can fit at once.

if there is no async loop running, this is challenging to control from inside of infinity.