cohere-ai / quick-start-connectors

This open-source repository offers reference code for integrating workplace datastores with Cohere's LLMs, enabling developers and businesses to perform seamless retrieval-augmented generation (RAG) on their own data.
https://docs.cohere.com/docs/connectors
MIT License
144 stars 26 forks source link

Sending additional parameters #128

Open llermaly opened 10 months ago

llermaly commented 10 months ago

Which connector is affected?

All sources that support filtering.

What would you like to see improved?

How possible is to send additional parameters for metadata filtering?

response = co.chat(  
    message="What is the chemical formula for glucose?",  
    connectors=[{"id": "my-connector", "params": {"some_field": "some_value"} }]  
)

The only way I can think of now is passing parameters on creation time:

created_connector = co.create_connector(
            name="Example connector",
            url="https://connector-example.com/search?some_field=some_value",
        )

But that's not very flexible.

Do you think calling the connector API directly with the filters, and then sending the results to the Cohere documents endpoint would do the trick?

curl --request POST  
    --url 'https://connector-example.com/search'
    --header 'Content-Type: application/json'  
    --data '{  
    "query": "How do I expense a meal?" ,
    "some_field": "some_value"
  }'

And then

            response = co.chat(
                message=message,
                documents=documents,
                conversation_id=self.conversation_id,
                stream=True,
            )

Is there a simpler way to achieve this filtering?

Thanks!

Additional information

No response

tianjing-li commented 10 months ago

This is a great question and a consideration I've had for a while as well. The difficulty in making a generic solution is that a lot of these 3rd party APIs either don't support metadata filtering, or have ways of filtering data that can vary.

For example, some APIs could require them as query parameters, others in the request body, others leverage a Python SDK, so we would have to call TheirSDKClient.search(query, myfilter1=value1, myfilter2=value2).

Now ideally from a user perspective, you can just pass in the filters like you've outlined in

response = co.chat(  
    message="What is the chemical formula for glucose?",  
    connectors=[{"id": "my-connector", "params": {"some_field": "some_value"} }]  
)

And be able to generated different search results per chat.

Solution 1 (Long-term - difficult)

We modify the connectors and update documentation as to what metadata fields can be used during query time, decide on a format to receive these values in when you pass them to the /chat endpoint in the connectors parameter.

The filtering logic would happen at the connector level. This would of course require a lot of work to do for all existing connectors. I would have to talk to the internal Coral API team as well, so that the field values you send with the connectors get sent to the /search request performed by the connector, but if we decide to go with a long-term solution this is probably the best route. The user shouldn't need to worry about how the search is filtered, only that they are able to.

Solution 2 (To unblock - easy)

As you've outlined, probably the easiest way is to retrieve the documents yourself and do all the metadata filtering prior to calling /chat.

llermaly commented 10 months ago

@tianjing-li Thanks for your quick answer. I really love how many connectors do you have available is amazing.

I think you should discuss with the internal team the ability of sending arbitrary parameters via API , and then every dev can configure the connector accordingly.

For example, having to pre-select a single folder in google drive connector limits the possibilities a lot, if you enable us to send the array of document names we can easily implement that in the connector.

I will go with Solution 2. Is the effect the same in terms of chunking/ranking using the connector than using the docs API? if it's better to use the connectors we can evaluate to copy the documents we need to a folder before making the call until we can filter within the connector.

Thanks

tianjing-li commented 10 months ago

@llermaly the chunking/ranking portion is done by the .chat() call itself, so no worries about not going through the connector. I'll raise the parameter suggestion internally, I agree that there's alot of value we can add.

llermaly commented 10 months ago

Thanks @tianjing-li we will be waiting. We can close this issue. Please let me know if you want me to give our feedback in the future

tianjing-li commented 10 months ago

@llermaly I raised the suggestion internally, perhaps we can keep this open for now. Tagging @walterbm-cohere and @daniel-cohere who work directly on the internal API

tianjing-li commented 9 months ago

@llermaly this is now in progress - from what I've gathered the connector's API will allow passing in an options JSON body that will need to be parsed by the connector, from your side you would be able to make any changes to connectors by referring to their docs and adding the filters appropriately.

Any PRs for these would be greatly appreciated!