dapr / components-contrib

Community driven, reusable components for distributed apps
Apache License 2.0
543 stars 470 forks source link

Cosmos DB: Query API not working if there's more than 1 partition #3029

Open ItalyPaleAle opened 1 year ago

ItalyPaleAle commented 1 year ago

This issue was reported by a user via Microsoft Support. The findings are reported below

It seems the issue occur when Cosmos DB has multi partitions.

(1) Create local Dapr environment and configure state store to cosmos db.

apiVersion: dapr.io/v1alpha1
kind: Component
metadata:
  name: statestore
spec:
  type: state.azure.cosmosdb
  version: v1
  metadata:
  - name: url
    value: https://something.documents.azure.com:443/
  - name: masterKey
    value: xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
  - name: database
    value: db
  - name: collection
    value: collection

(2) Configure Cosmos DB scale to use multiple partitions. Setting the throughput to Manual: 11000. image

(3) After setting (2), wait about 30 minutes, as it takes time for the partitions to actually scale out.

(4) Then run dapr.

(5) Run the following API.

https://docs.dapr.io/reference/api/state_api/#query-state

You can see the following error.


PS C:\Users\toruita> Invoke-WebRequest -Method Post -Headers @{"Content-type"="application/json"} -Uri 'http://localhost:3500/v1.0-alpha1/state/statestore/query?metadata.contentType=application/json' -Body '{}'
Invoke-WebRequest : {"errorCode":"ERR_STATE_QUERY","message":"failed query in state store statestore: context canceled"}
???? ?:1 ??:1
+ Invoke-WebRequest -Method Post -Headers @{"Content-type"="application ...
+ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    + CategoryInfo          : InvalidOperation: ([System.Net](http://system.net/).HttpWebRequest:HttpWebRequest) [Invoke-WebRequest]?WebException
    + FullyQualifiedErrorId : WebCmdletWebResponseException,Microsoft.PowerShell.Commands.InvokeWebRequestCommand----```
berndverst commented 1 year ago

What is interesting is that we specifically consulted with cosmos DB folks on this - since the SDK does not have built in support for cross partition queries we manually added the required headers (via the policy options) to enable cross partition queries.

Worth investigating further why that isn't working.

github-actions[bot] commented 1 year ago

This issue has been automatically marked as stale because it has not had activity in the last 30 days. It will be closed in the next 7 days unless it is tagged (pinned, good first issue, help wanted or triaged/resolved) or other activity occurs. Thank you for your contributions.

joshuadmatthews commented 11 months ago

+1 for fixing this

berndverst commented 10 months ago

I reached out to the Azure CosmosDB service team to get their thoughts on our implementation and why this could be happening.

berndverst commented 10 months ago

There is a possibility that this cannot be fixed and we actually would need to remove Query API support for CosmosDB. This is a possibility given the Alpha status.

We may need to reimplement query support entirely to do the following:

This is a lot of work, especially for an API that we do not plan to bring to Beta or Stable.

berndverst commented 10 months ago

Another option - and this is probably the easiest:

  1. Check whether there is only a single partition
  2. If 1 partition: perform query with the native SDK method - not our custom code.
  3. Otherwise, throw an error -- we will not support cross-partition queries.

This approach might be acceptable because nobody should be using an Alpha component in a production scenario with multiple partitions.

joshuadmatthews commented 10 months ago

I’d vote for an actual fix. Cross partition queries are certainly supported in Cosmos, it would be nice for them to work with Dapr.

berndverst commented 10 months ago

I’d vote for an actual fix. Cross partition queries are certainly supported in Cosmos, it would be nice for them to work with Dapr.

@joshuadmatthews cross-partition queries are not supported in the CosmosDB GO SDK and the Azure SDK team has no plans to implement this in their roadmap. Cosmos DB does not perform these queries server side but does so manually in the SDK with lots of manual code to aggregate and sort things in memory. You can read all about it if you go to the Azure SDK repos. That is too much work for Dapr however.

So the choices are single partition only, removing the Query support entirely, or possibly a rudimentary support where we send the same query to each partition but will not perform any further aggregation, sorting or filtering in Dapr.

Technically our current implementation should work, but the gateway server (not used by any of the official SDKs because of its severe query limitations) seems to time out. We have no choice but to change our approach.

If the Azure SDK for Go Team ever provides native cross partition query support we'd of course use that instead.

I want to remind the community again that Alpha in Dapr means experimental - we may discontinue Alpha features. We have long decided that Query API cannot progress to Beta given the way it was designed. It is not sustainable to support and maintain this. I must strongly discourage using the Query API.

joshuadmatthews commented 10 months ago

Can you share a link to the Azure SDK repo section you are talking about? Interested to read up on that. The dotnet v3 Cosmos SDK seems like an official SDK, and also seems to be using the header approach, but I'm sure I'm missing something there.

https://github.com/Azure/azure-cosmos-dotnet-v3/blob/e534de251bdadafd1adb960da83c15d463486a66/Microsoft.Azure.Cosmos/src/RequestOptions/QueryRequestOptions.cs#L173-L177

berndverst commented 7 months ago

I explained in my previous comment how the DotNet SDK manages to query all partitions and aggregate results. The Go SDK does not have the ability to do this.

As a result, this would need to be manually implemented in Dapr. That feels like the wrong approach however. Instead we need to wait for the Go SDK (github.com/azure/azure-sdk-for-go) to support this for CosmosDB.

If anyone feels inclined to work on this, I suggest contributing to the Azure SDK for Go, and then Dapr can simply consume the updated SDK.

litan1106 commented 5 months ago

If we are going to remove the query api from cosmosdb, then we need a way to filter items.