Closed pmeier closed 10 months ago
This proposal is interesting, but it might not work for all possible source storage backings as sometimes the actual chunks can be remote, opaque, and unreferenceable (e.g., Vectara). Thus, I don't think it is clear that storage is going to be largely duplicated locally. Further, chat messages about a corpus will likely number much less than the chunks in a corpus anyway.
I'm also skeptical about different access controls between the state database and the vector store really being that relevant. If you're storing a summarized answer off of a context in the state database, why shouldn't you be allowed to store the source texts underpinning that answer?
it might not work for all possible source storage backings as sometimes the actual chunks can be remote, opaque, and unreferenceable
Ugh, this alone is a show stopper for the proposal. Thanks for bringing that up. I was under the impression that all DBs should have this feature, but I guess I was wrong.
I'm also skeptical about different access controls between the state database and the vector store really being that relevant. If you're storing a summarized answer off of a context in the state database, why shouldn't you be allowed to store the source texts underpinning that answer?
That is a fair point. I honestly don't remember all details around this as this discussion happened before we even had a repository. Let me see if I can get some context here.
In the mean time: do you have other use cases in mind where this feature would be beneficial other than what I have laid out in https://github.com/Quansight/ragna/issues/248#issue-2038658078?
As far as additional use cases are concerned, I would point first and foremost to Tonic Validate Metrics. As you can see, five out of the six suggested metrics involve the retrieved context:
- Answer similarity score: How well does the RAG answer match what the answer should be?
- Retrieval precision: Is the retrieved context relevant to the question?
- Augmentation precision: Is the relevant retrieved context in the answer?
- Augmentation accuracy: How much of the retrieved context is in the answer?
- Answer consistency (binary): Does the answer contain any information that does not come from the retrieved context?
- Retrieval k-recall: For the top k context vectors, where the retrieved context is a subset of the top k context vectors, is the retrieved context all of the relevant context among the top k context vectors for answering the question?
Anyone wanting to utilize Ragna in an automatic evaluation loop is going to require Source.content
be exposed, ideally in response to calling the /chats/{id}/answer
endpoint.
To be fair here, the source content is available on the Python object. Meaning, if you use the Python API of the validator, you could also use the Python API of Ragna and circumvent the problem all together. It will only get tricky if you want the hit the validator from results of Ragnas REST API.
Yes, unfortunately, we have teams that want to use the REST API. Furthermore, even our UI users would like to see this information as it is readily available in other RAG systems.
After an internal "Do you remember why we did this?" session, we couldn't come up with a reason. We remembered that we made that choice intentionally, but no idea why. Our best guess is that the decision was based on a misconception. We agree that there is no security problem if the source in the state database can only be accessed by the same user that uploaded the document in the first place. Let's implement this!
Yes, unfortunately, we have teams that want to use the REST API.
I'm interested in this use case. What does your plan look like? Host the REST API somewhere and let the other teams hit this directly? Or are you going to write your own mini SDK around that similar to what we did with
If its the latter, I think we could just clean up the wrapper and make it public. If that is of interest, please open an issue.
Honestly, I wasn't aware of the wrapper's existence. Or, maybe I was vaguely aware of it, but didn't actually think of using it or recommending its use.
I basically have a Ragna instance hosted, and the evaluation team has written their tests to its REST API in whatever language they're using (probably Python, which is a bummer because they could have used the wrapper).
Do you still want me to open an issue on cleaning up the wrapper and making it public? I think it's a good idea, but not of immediate interest to me.
Do you still want me to open an issue on cleaning up the wrapper and making it public? I think it's a good idea, but not of immediate interest to me.
If you find the time, I would appreciate it. But I can do it as well. I would be interested in two things:
pydantic
data objects from the schemas that we have (unlikely, since the schemas are private, but maybe desired?)ragna.core
objects from the JSON toI feel that if I open the issue it is going to be my speculations and not anything driven by a solid use case. I'll speculate here, though, and answer your questions:
I would go with a synchronous client to start. It is easier to build and test. Further, it kind of avoids the evils of premature optimization. I'm pretty sure that a competent developer could wrap parts of the synchronous client to make select calls asynchronous if necessary.
As far as I know, the team hitting my Ragna instance is using plain JSON as returned by the API.
In that case, let's wait for a solid use case to arise. I feel proper client is useful, but without any data of what users want, this will be hard to get right. In case you get a chance to talk to the evaluation team about this, I'm eager to hear their opinions.
Designs for this, as discussed with @pmeier, are here. ALso attaching them as PNGs:
Default (collapsed)
On click (expanded)
To add one more piece of information that I stumbled over: each source content box should have no maximum height and should extend as long as it needs be. That prevents us from a double scrollbar.
Feature description
While our
ragna.core.Source
object stores the source contenthttps://github.com/Quansight/ragna/blob/7b176528c7524edc8bcf8754695caa5a363a8e23/ragna/core/_components.py#L79-L88
the source object of the REST API does not:
https://github.com/Quansight/ragna/blob/7b176528c7524edc8bcf8754695caa5a363a8e23/ragna/deploy/_api/schemas.py#L35-L39
This is an intentional choice. When creating messages "live" we have access to the sources
https://github.com/Quansight/ragna/blob/7b176528c7524edc8bcf8754695caa5a363a8e23/ragna/deploy/_api/core.py#L238
However when chats and in turn messages and sources are recreated from our state database, we would only have access to them if we stored the source content as part of the state database. We don't want to do that for two reasons:
ragna.core.SourceStorage
already stores the source content.Instead what we could do is to add a new protocol method
ragna.core.SourceStorage.get
(name TBD):In contrast to
ragna.core.SourceStorage.retrieve
will.get()
not engage in the source selection based on the embedding of a prompt, but rather get one source by its ID. With that we could fill the source content when construction the chat object from the state database.Value and/or benefit
When viewing the sources in the web UI, we currently only displaying the location of the source in the document, e.g. the page number. However, for documents like
.txt
there is no page number available. Meaning, the only information a user gets is that the corresponding document was used, but not what section of it. If we had the source content available, we could display that instead.For the web UI we need to think carefully how this would look like, since we don't want to overwhelm the user with information.
cc @peachkeel for other use cases / scenarios that I missed.
Anything else?
Originally reported by @peachkeel in https://github.com/Quansight/ragna/issues/242#issuecomment-1848669705