Open llermaly opened 10 months ago
This is a great question and a consideration I've had for a while as well. The difficulty in making a generic solution is that a lot of these 3rd party APIs either don't support metadata filtering, or have ways of filtering data that can vary.
For example, some APIs could require them as query parameters, others in the request body, others leverage a Python SDK, so we would have to call TheirSDKClient.search(query, myfilter1=value1, myfilter2=value2)
.
Now ideally from a user perspective, you can just pass in the filters like you've outlined in
response = co.chat(
message="What is the chemical formula for glucose?",
connectors=[{"id": "my-connector", "params": {"some_field": "some_value"} }]
)
And be able to generated different search results per chat.
We modify the connectors and update documentation as to what metadata fields can be used during query time, decide on a format to receive these values in when you pass them to the /chat endpoint in the connectors
parameter.
The filtering logic would happen at the connector level. This would of course require a lot of work to do for all existing connectors. I would have to talk to the internal Coral API team as well, so that the field values you send with the connectors get sent to the /search request performed by the connector, but if we decide to go with a long-term solution this is probably the best route. The user shouldn't need to worry about how the search is filtered, only that they are able to.
As you've outlined, probably the easiest way is to retrieve the documents yourself and do all the metadata filtering prior to calling /chat.
@tianjing-li Thanks for your quick answer. I really love how many connectors do you have available is amazing.
I think you should discuss with the internal team the ability of sending arbitrary parameters via API , and then every dev can configure the connector accordingly.
For example, having to pre-select a single folder in google drive connector limits the possibilities a lot, if you enable us to send the array of document names we can easily implement that in the connector.
I will go with Solution 2. Is the effect the same in terms of chunking/ranking using the connector than using the docs API? if it's better to use the connectors we can evaluate to copy the documents we need to a folder before making the call until we can filter within the connector.
Thanks
@llermaly the chunking/ranking portion is done by the .chat()
call itself, so no worries about not going through the connector. I'll raise the parameter suggestion internally, I agree that there's alot of value we can add.
Thanks @tianjing-li we will be waiting. We can close this issue. Please let me know if you want me to give our feedback in the future
@llermaly I raised the suggestion internally, perhaps we can keep this open for now. Tagging @walterbm-cohere and @daniel-cohere who work directly on the internal API
@llermaly this is now in progress - from what I've gathered the connector's API will allow passing in an options
JSON body that will need to be parsed by the connector, from your side you would be able to make any changes to connectors by referring to their docs and adding the filters appropriately.
Any PRs for these would be greatly appreciated!
Which connector is affected?
All sources that support filtering.
What would you like to see improved?
How possible is to send additional parameters for metadata filtering?
The only way I can think of now is passing parameters on creation time:
But that's not very flexible.
Do you think calling the connector API directly with the filters, and then sending the results to the Cohere documents endpoint would do the trick?
And then
Is there a simpler way to achieve this filtering?
Thanks!
Additional information
No response