Open will-fairsupply opened 4 months ago
Hey! Thank you for the feedback & suggestion.
I agree with this, it'd be pretty useful for the ColBERT-reranker. I've tried to make the lib very lightweight and specific reranker-agnostic right now, mostly due to limited development time on my end, but I have nothing against such extensions -- this would be a more than welcome addition.
My only requirements would be that:
colbert_keep_index=True
?)Depending on the number of documents, we might not need any sort of indexing -- keeping them in memory could be fine, though it'd balloon up quite fast.
Switching from work github profile to personal github profile. Will submit from this.
Great, I've got some familiarity with some common indexing mechanisms. I will put something together and submit a pull request for review.
If I can make something that can be reranker agnostic and be used with a number of options I will. Failing that, I will do something that is ColBERT specific and am happy to contribute more if it seems like a good idea.
Did this ever happen, I was going to ask the same thing!
Did this ever happen, I was going to ask the same thing!
Not yet! In case it doesn't get picked up as a PR, it is on my to-do list, but RAGatouille is taking the first-class citizen spot for a while in terms of open source projects (it badly needs an overhaul!), so it might be a while.
Yes, not yet, but I will pick this up over the weekend. I'll have a branch for review and would appreciate feedback following that.
Thanks!
Hey @w-v-r, any updates on this? No worries if you're no longer able to dedicate time to this!
Mostly thinking about this in the context of making the colbert reranker more useful... it would be great to be able to load the docs and create the index when you instantiate the reranker. This prevents repeating that computation each time we wish to produce a result.
I am happy to take this up, but wanted to raise it as an issue and see if it fits within the spirit of the project.
I do think that extending the colbert ranker (and others) in the way I describe will allow a broader use case.