Open agori80 opened 6 months ago
The official ColBERT implementation has a built-in query server (using Flask), which you can easily query via API requests and does support indexes generated with RAGatouille! This should be enough for most small applications, so long as you can persist the index on disk.
Given the superb peformance of this Colbert implementation, I am considering integrating this into a search pipeline. There would be a server running and accepting queries from a few users (probably not a very high load, typical enterprise setting). Queries would be executed concurrently.
Does it make sense to use this in such situation or is it just meant for offline usage?