Load docs and create index with instantiation of reranker instead of with each rerank.

AnswerDotAI / rerankers

A lightweight, low-dependency, unified API to use all common reranking and cross-encoder models.

Apache License 2.0

1.04k stars 57 forks source link

Load docs and create index with instantiation of reranker instead of with each rerank. #19

Open will-fairsupply opened 4 months ago

will-fairsupply commented 4 months ago

Mostly thinking about this in the context of making the colbert reranker more useful... it would be great to be able to load the docs and create the index when you instantiate the reranker. This prevents repeating that computation each time we wish to produce a result.

I am happy to take this up, but wanted to raise it as an issue and see if it fits within the spirit of the project.

I do think that extending the colbert ranker (and others) in the way I describe will allow a broader use case.

bclavie commented 4 months ago

Hey! Thank you for the feedback & suggestion.

I agree with this, it'd be pretty useful for the ColBERT-reranker. I've tried to make the lib very lightweight and specific reranker-agnostic right now, mostly due to limited development time on my end, but I have nothing against such extensions -- this would be a more than welcome addition.

My only requirements would be that:

We don't add any unnecessary dependencies to the existing instal options, so any external indexing mechanism should be its own specific additional install
The code itself stays very contained, so that someone could still use the "raw" ColBERT reranker without needing to import whatever the index code needs. It could be in a different ColBERTreranker file, which the user could choose to use with an extra kwarg (colbert_keep_index=True?)

Depending on the number of documents, we might not need any sort of indexing -- keeping them in memory could be fine, though it'd balloon up quite fast.

w-v-r commented 4 months ago

Switching from work github profile to personal github profile. Will submit from this.

Great, I've got some familiarity with some common indexing mechanisms. I will put something together and submit a pull request for review.

If I can make something that can be reranker agnostic and be used with a number of options I will. Failing that, I will do something that is ColBERT specific and am happy to contribute more if it seems like a good idea.

stevoslates commented 3 months ago

Did this ever happen, I was going to ask the same thing!

bclavie commented 3 months ago

Did this ever happen, I was going to ask the same thing!

Not yet! In case it doesn't get picked up as a PR, it is on my to-do list, but RAGatouille is taking the first-class citizen spot for a while in terms of open source projects (it badly needs an overhaul!), so it might be a while.

w-v-r commented 3 months ago

Yes, not yet, but I will pick this up over the weekend. I'll have a branch for review and would appreciate feedback following that.

Thanks!

bclavie commented 4 days ago

Hey @w-v-r, any updates on this? No worries if you're no longer able to dedicate time to this!