Add a comparison workflow / app

Quansight / ragna

RAG orchestration framework ⛵️

https://ragna.chat

BSD 3-Clause "New" or "Revised" License

179 stars 22 forks source link

Add a comparison workflow / app #18

Open dharhas opened 1 year ago

dharhas commented 1 year ago

A common use case during the research/exploration phase is to compare/assess the difference in responses based on different embedding models/llm's etc. It would be useful to make this workflow easier.

i.e.

setup a matrix of configuration options
setup the list of docs to use
setup a list of questions to ask
summarize the responses from each configuration along with any relevant metrics (response time etc)
potentially calculate similarity scores between responses.

This could be a fairly straightforward panel app.

iiLaurens commented 1 year ago

In addition a basic annotation work flow would help. Getting anecdotal evidence of a good RAG is obviously nice, but a more systematic approach would help compare different configurations.

I often make a set of questions and annotated chunks (labels could be "relevant", "inconclusive" or "irrelevant" in its ability to answer the question). Then make a summary table that shows per question and model configuration how well the embedding models rank on retrieving the relevant (and inconclusive) chunks. This also helps me in the future when new embedding models are released and I want to test them.

pmeier commented 1 year ago

Add a comparison workflow / app

We need to clarify what we want here. Since we have a fully featured Python API, the "workflow" part is already covered. However, if you haven't worked with async programming before, it might be non-obvious. We should have an example in the documentation for this.

As for the "app" part, I'm not super enthusiastic about it. This whole use case screams experimentation. And for that you need all kinds of knobs, which is very hard to get consistent in a general UI. This is why we built the Python API (note that the issue was created before the Python API was a thing). IMO, if someone really wants / needs an UI for that, it should be a third-party app that builds on top of the Ragna Python / REST API.

pmeier commented 11 months ago

Bumping impact to medium. For the 0.2.0 release, we are going to add an example to the documentation to describe the comparison workflow together with the Python API. Although still not enthusiastic about a possible UI for this, let's open a separate issue for that as soon as the documentation is updated.