Open dharhas opened 1 year ago
+1
In addition a basic annotation work flow would help. Getting anecdotal evidence of a good RAG is obviously nice, but a more systematic approach would help compare different configurations.
I often make a set of questions and annotated chunks (labels could be "relevant", "inconclusive" or "irrelevant" in its ability to answer the question). Then make a summary table that shows per question and model configuration how well the embedding models rank on retrieving the relevant (and inconclusive) chunks. This also helps me in the future when new embedding models are released and I want to test them.
Add a comparison workflow / app
We need to clarify what we want here. Since we have a fully featured Python API, the "workflow" part is already covered. However, if you haven't worked with async programming before, it might be non-obvious. We should have an example in the documentation for this.
As for the "app" part, I'm not super enthusiastic about it. This whole use case screams experimentation. And for that you need all kinds of knobs, which is very hard to get consistent in a general UI. This is why we built the Python API (note that the issue was created before the Python API was a thing). IMO, if someone really wants / needs an UI for that, it should be a third-party app that builds on top of the Ragna Python / REST API.
Bumping impact to medium. For the 0.2.0
release, we are going to add an example to the documentation to describe the comparison workflow together with the Python API. Although still not enthusiastic about a possible UI for this, let's open a separate issue for that as soon as the documentation is updated.
A common use case during the research/exploration phase is to compare/assess the difference in responses based on different embedding models/llm's etc. It would be useful to make this workflow easier.
i.e.
This could be a fairly straightforward panel app.