Also send source content as part of the message object of the REST API / Web UI

pmeier commented 11 months ago

Feature description

While our ragna.core.Source object stores the source content

https://github.com/Quansight/ragna/blob/7b176528c7524edc8bcf8754695caa5a363a8e23/ragna/core/_components.py#L79-L88

the source object of the REST API does not:

https://github.com/Quansight/ragna/blob/7b176528c7524edc8bcf8754695caa5a363a8e23/ragna/deploy/_api/schemas.py#L35-L39

This is an intentional choice. When creating messages "live" we have access to the sources

https://github.com/Quansight/ragna/blob/7b176528c7524edc8bcf8754695caa5a363a8e23/ragna/deploy/_api/core.py#L238

However when chats and in turn messages and sources are recreated from our state database, we would only have access to them if we stored the source content as part of the state database. We don't want to do that for two reasons:

We are duplicating the storage. As the name implies, the ragna.core.SourceStorage already stores the source content.
There are potential security issues since the state database might have different access control as the source storage, e.g. vector database.

Instead what we could do is to add a new protocol method ragna.core.SourceStorage.get (name TBD):

def get(self, chat_id: uuid.UUID, source_id: str) -> ragna.core.Source:
    ...

In contrast to ragna.core.SourceStorage.retrieve will .get() not engage in the source selection based on the embedding of a prompt, but rather get one source by its ID. With that we could fill the source content when construction the chat object from the state database.

Value and/or benefit

When viewing the sources in the web UI, we currently only displaying the location of the source in the document, e.g. the page number. However, for documents like .txt there is no page number available. Meaning, the only information a user gets is that the corresponding document was used, but not what section of it. If we had the source content available, we could display that instead.

For the web UI we need to think carefully how this would look like, since we don't want to overwhelm the user with information.

cc @peachkeel for other use cases / scenarios that I missed.

Anything else?

Originally reported by @peachkeel in https://github.com/Quansight/ragna/issues/242#issuecomment-1848669705

peachkeel commented 11 months ago

This proposal is interesting, but it might not work for all possible source storage backings as sometimes the actual chunks can be remote, opaque, and unreferenceable (e.g., Vectara). Thus, I don't think it is clear that storage is going to be largely duplicated locally. Further, chat messages about a corpus will likely number much less than the chunks in a corpus anyway.

I'm also skeptical about different access controls between the state database and the vector store really being that relevant. If you're storing a summarized answer off of a context in the state database, why shouldn't you be allowed to store the source texts underpinning that answer?

pmeier commented 11 months ago

it might not work for all possible source storage backings as sometimes the actual chunks can be remote, opaque, and unreferenceable

Ugh, this alone is a show stopper for the proposal. Thanks for bringing that up. I was under the impression that all DBs should have this feature, but I guess I was wrong.

I'm also skeptical about different access controls between the state database and the vector store really being that relevant. If you're storing a summarized answer off of a context in the state database, why shouldn't you be allowed to store the source texts underpinning that answer?

That is a fair point. I honestly don't remember all details around this as this discussion happened before we even had a repository. Let me see if I can get some context here.

In the mean time: do you have other use cases in mind where this feature would be beneficial other than what I have laid out in https://github.com/Quansight/ragna/issues/248#issue-2038658078?

peachkeel commented 11 months ago

As far as additional use cases are concerned, I would point first and foremost to Tonic Validate Metrics. As you can see, five out of the six suggested metrics involve the retrieved context:

Answer similarity score: How well does the RAG answer match what the answer should be?

Retrieval precision: Is the retrieved context relevant to the question?

Augmentation precision: Is the relevant retrieved context in the answer?

Augmentation accuracy: How much of the retrieved context is in the answer?

Answer consistency (binary): Does the answer contain any information that does not come from the retrieved context?

Retrieval k-recall: For the top k context vectors, where the retrieved context is a subset of the top k context vectors, is the retrieved context all of the relevant context among the top k context vectors for answering the question?

Anyone wanting to utilize Ragna in an automatic evaluation loop is going to require Source.content be exposed, ideally in response to calling the /chats/{id}/answer endpoint.

pmeier commented 11 months ago

To be fair here, the source content is available on the Python object. Meaning, if you use the Python API of the validator, you could also use the Python API of Ragna and circumvent the problem all together. It will only get tricky if you want the hit the validator from results of Ragnas REST API.

peachkeel commented 11 months ago

Yes, unfortunately, we have teams that want to use the REST API. Furthermore, even our UI users would like to see this information as it is readily available in other RAG systems.

pmeier commented 11 months ago

After an internal "Do you remember why we did this?" session, we couldn't come up with a reason. We remembered that we made that choice intentionally, but no idea why. Our best guess is that the decision was based on a misconception. We agree that there is no security problem if the source in the state database can only be accessed by the same user that uploaded the document in the first place. Let's implement this!

Yes, unfortunately, we have teams that want to use the REST API.

I'm interested in this use case. What does your plan look like? Host the REST API somewhere and let the other teams hit this directly? Or are you going to write your own mini SDK around that similar to what we did with

https://github.com/Quansight/ragna/blob/c9fd83dd3116649cb007cc384cb70cf178623d9a/ragna/deploy/_ui/api_wrapper.py#L14-L15

If its the latter, I think we could just clean up the wrapper and make it public. If that is of interest, please open an issue.

peachkeel commented 11 months ago

Honestly, I wasn't aware of the wrapper's existence. Or, maybe I was vaguely aware of it, but didn't actually think of using it or recommending its use.

I basically have a Ragna instance hosted, and the evaluation team has written their tests to its REST API in whatever language they're using (probably Python, which is a bummer because they could have used the wrapper).

Do you still want me to open an issue on cleaning up the wrapper and making it public? I think it's a good idea, but not of immediate interest to me.

pmeier commented 11 months ago

Do you still want me to open an issue on cleaning up the wrapper and making it public? I think it's a good idea, but not of immediate interest to me.

If you find the time, I would appreciate it. But I can do it as well. I would be interested in two things:

Do we need a sync and an async client or would one of them be sufficient?
What output format are they working with?
1. Plain JSON as returned by the API
2. Creating pydantic data objects from the schemas that we have (unlikely, since the schemas are private, but maybe desired?)
3. Creating ragna.core objects from the JSON to

peachkeel commented 11 months ago

I feel that if I open the issue it is going to be my speculations and not anything driven by a solid use case. I'll speculate here, though, and answer your questions:

I would go with a synchronous client to start. It is easier to build and test. Further, it kind of avoids the evils of premature optimization. I'm pretty sure that a competent developer could wrap parts of the synchronous client to make select calls asynchronous if necessary.
As far as I know, the team hitting my Ragna instance is using plain JSON as returned by the API.

pmeier commented 11 months ago

In that case, let's wait for a solid use case to arise. I feel proper client is useful, but without any data of what users want, this will be hard to get right. In case you get a chance to talk to the evaluation team about this, I'm eager to hear their opinions.

smeragoel commented 10 months ago

Designs for this, as discussed with @pmeier, are here. ALso attaching them as PNGs:

Default (collapsed)
On click (expanded)

pmeier commented 10 months ago

To add one more piece of information that I stumbled over: each source content box should have no maximum height and should extend as long as it needs be. That prevents us from a double scrollbar.

Quansight / ragna