Anush008 / fastembed-rs

Library for generating vector embeddings, reranking in Rust
https://docs.rs/fastembed
Apache License 2.0
264 stars 36 forks source link

Small reranking models #75

Closed jawj closed 3 months ago

jawj commented 3 months ago

@Anush008 and @GrisaiaEvy, many thanks for your work on this library.

At present, multiple embedding/bi-encoding models are supported plus the user can bring their own. But for reranking/cross-encoding, only bge-reranker-base is available.

I'd like to use a smaller reranking model, ideally sized on par with the default embedding model, bge-small-en-v1.5 (something like jina-reranker-v1-turbo-en seems promising).

How hard do you think it would be to add support for a small reranking model like this, or for bring-your-own reranking models? Is it on your roadmap?

Anush008 commented 3 months ago

Hey @jawj. There's no roadmap as such. Everyone just makes up their mind their and does some contributions(Me included). Almost all new propositions are accepted.

jawj commented 3 months ago

Thanks for the quick reply.

OK. I guess my two follow-ups are:

Anush008 commented 3 months ago

Adding a new reranker model shouldn't be much work. Just listing it at https://github.com/Anush008/fastembed-rs/blob/ac715a5ad1d4f7483dd86e976e83b2e101505ad2/src/models/reranking.rs#L2-L15 should do if I it's the same architecture.

Anush008 commented 3 months ago

An example of a dense model addition is https://github.com/Anush008/fastembed-rs/pull/69

jawj commented 3 months ago

Great — I think I'll have a go at this, then.