AnswerDotAI / rerankers

A lightweight, low-dependency, unified API to use all common reranking and cross-encoder models.
Apache License 2.0
484 stars 29 forks source link

Load docs and create index with instantiation of reranker instead of with each rerank. #19

Open will-fairsupply opened 2 months ago

will-fairsupply commented 2 months ago

Mostly thinking about this in the context of making the colbert reranker more useful... it would be great to be able to load the docs and create the index when you instantiate the reranker. This prevents repeating that computation each time we wish to produce a result.

I am happy to take this up, but wanted to raise it as an issue and see if it fits within the spirit of the project.

I do think that extending the colbert ranker (and others) in the way I describe will allow a broader use case.

bclavie commented 1 month ago

Hey! Thank you for the feedback & suggestion.

I agree with this, it'd be pretty useful for the ColBERT-reranker. I've tried to make the lib very lightweight and specific reranker-agnostic right now, mostly due to limited development time on my end, but I have nothing against such extensions -- this would be a more than welcome addition.

My only requirements would be that:

Depending on the number of documents, we might not need any sort of indexing -- keeping them in memory could be fine, though it'd balloon up quite fast.

w-v-r commented 1 month ago

Switching from work github profile to personal github profile. Will submit from this.

Great, I've got some familiarity with some common indexing mechanisms. I will put something together and submit a pull request for review.

If I can make something that can be reranker agnostic and be used with a number of options I will. Failing that, I will do something that is ColBERT specific and am happy to contribute more if it seems like a good idea.

stevoslates commented 3 weeks ago

Did this ever happen, I was going to ask the same thing!

bclavie commented 3 weeks ago

Did this ever happen, I was going to ask the same thing!

Not yet! In case it doesn't get picked up as a PR, it is on my to-do list, but RAGatouille is taking the first-class citizen spot for a while in terms of open source projects (it badly needs an overhaul!), so it might be a while.

w-v-r commented 3 weeks ago

Yes, not yet, but I will pick this up over the weekend. I'll have a branch for review and would appreciate feedback following that.

Thanks!