Azure / azure-sdk-for-go

This repository is for active development of the Azure SDK for Go. For consumers of the SDK we recommend visiting our public developer docs at:
https://docs.microsoft.com/azure/developer/go/
MIT License
1.58k stars 816 forks source link

CosmosDB: Allow running queries without Partition Key #18578

Open rkilburn opened 2 years ago

rkilburn commented 2 years ago

Feature Request

In PR #17657, support was added for running an CosmosDB SQL query against a single partition in CosmosDB. However, in the Data Explorer UI and in other SDKs, a query can be run without specifying a partition key, and run across the entire container.

Could this functionality be added for feature parity with the other SDKs 🙏

Thanks!

ealsur commented 2 years ago

While this is in the horizon, it won't be available anytime soon (probably next 12 months).

ealsur commented 2 years ago

Just to clarify, there are other more important and critical deliverables that take precedence, such as failover support.

marwan-at-work commented 2 years ago

Hi there,

Are there any alternatives while we wait in the next 12 months?

Thanks

ealsur commented 2 years ago

Cross-partition query support is not a trivial feature. The service (REST API) does not support cross-partition queries, it is a complete orchestration from the client, especially with aggregates (like SUM, COUNT, DISTINCT, etc) it's not something that can be easily done.

This same problem applies if you want to implement it yourself.

You can certainly do multiple queries if you already know all the possible Partition Keys, besides that, there is no alternative.

calmh commented 2 years ago

Can one not set the x-ms-query-enable-crosspartition query option in the REST API? This appears to work for me.

ealsur commented 2 years ago

You can certainly set that header, it does not mean the REST API is actually performing the query across partitions. Just to give a basic understanding of how it works:

  1. The SDK (any SDK supporting cross-partitioning queries) needs to fetch the partition map of the account
  2. Issue the query across all partitions
  3. Grab all the results across partitions and apply aggregations, ordering, etc. This step is completely client-side.

You can refer to other Cosmos SDKs if you want the implementation details: https://github.com/Azure/azure-sdk-for-java/tree/main/sdk/cosmos/azure-cosmos/src/main/java/com/azure/cosmos/implementation/query

calmh commented 2 years ago

Thank you for clarifying that!

KiraTheGenius commented 8 months ago

Is that possible to do it this way ?? queryPager := c.containerClient.NewQueryItemsPager(query, azcosmos.NullPartitionKey, nil)

ealsur commented 8 months ago

@KiraTheGenius no. NullPartitionKey means documents with the Partition Key property with a null value, such as:

{
  "id": "the id",
  "pk": null
}

As mentioned before, cross-partitions queries are not simply sending a header.

ealsur commented 8 months ago

Sending x-ms-query-enable-crosspartition on a request does not cover the full spectrum of queries.

You can set the header with, for example, a policy but if the query has any aggregates or TOP, ORDER BY, OFFSET LIMIT, DISTINCT, and GROUP BY it won't work. Hence, in order to provide a fully functioning cross-partition SDK API, it requires client side code.

joshuadmatthews commented 7 months ago

Hey guys, this is a blocker for functional query support with the Dapr CosmosDB state store component. I am open to helping implement this, can someone explain to me what makes the aggregation/sorting so difficult? Can't we just peek at the Microsoft SDK and duplicate the logic in GO?

mihaitodor commented 7 months ago

@joshuadmatthews This might be useful in the new experimental Go driver: https://github.com/microsoft/gocosmos/blob/c4c3435bb68819101b2961994bcc99a6b3c7565d/restclient.go#L860

Disclaimer: I haven't tried it myself.

ealsur commented 7 months ago

That library seems to do cross-partition queries (which is sending a request to each partition) but:

@joshuadmatthews The level of complexity is beyond just doing HTTP requests and I already mentioned this in a previous comment: https://github.com/Azure/azure-sdk-for-go/issues/18578#issuecomment-1222510989

This is on top of things like partition splits/merges and handling state (you cannot end up querying partitions for each query as it will be extremely inefficient and consume RUs in the account, so you need a cache, but how do you maintain it and synchronize it is another vector of complexity).

joshuadmatthews commented 7 months ago

I understand that it is complex, but if other libraries have implemented it then it seems like a surmountable problem. Is there something specific about the go implementation that makes you think it is specifically less straightforward in go as it is in the dotnet library?

ealsur commented 7 months ago

@joshuadmatthews We (Cosmos DB SDK team) implemented it in the other languages, it is not an impossible problem. It's a matter of resources, times, and priorities.

This is an Issue tracking the delivery of this as a feature, but there are other more important and critical features before this. From the priority perspective, the available resources from the team are assigned to tackle the items with priority, and drain that work from top to bottom.

Cross-partition query is not more important than, for example, general high-availability and failover semantic support.

It is not less straightforward than, say, .NET, but that implementation (.NET) was also neither trivial or short in term of time and resources.

joshuadmatthews commented 7 months ago

Ok thank you for expanding on that. Apologies for being persistent. I will take a look at the .NET implementation and see if I can get my mind around it, and if so maybe I can help on the Go version.

ankitjain91 commented 4 months ago

Hi, any updates on this issue? We still cannot query the documents without partitionKey?

KiraTheGenius commented 4 months ago

Hi, any updates on this issue? We still cannot query the documents without partitionKey?

With this repo and package you can not but there are other packages such as gocosmos, ...

ealsur commented 4 months ago

We just did the last release 8 days ago (1.0.0) with the latest critical features, enabling cross-region failover. This item is now part of the backlog for the team to continue with.

jim-minter commented 1 month ago

@ealsur, does the scope of this issue include both enablement of basic cross-partition queries and also client support for complex queries (order by, etc.), or do we need to open a second issue for the latter?

jim-minter commented 1 month ago

@ealsur improvements in public documentation on partition key ranges and x-ms-documentdb-partitionkeyrangeid header would also be extremely useful. Examples: what are the values for pkrange status and how should they be interpreted? How should parents be used? Are pkranges sorted? How are the ids stable? How do reported pkranges change during a split? For how long are old pkranges valid during a split operation?

Ideally native Go client support so we don't need to care about any of this would be best, but failing that, documenting invariants so that they can be safely used would be a start. At the moment, no support and no documentation is the worst place to be.

Where is the right place to request documentation improvements please?

ealsur commented 1 month ago

@ealsur, does the scope of this issue include both enablement of basic cross-partition queries and also client support for complex queries (order by, etc.), or do we need to open a second issue for the latter?

Both.

Where is the right place to request documentation improvements please?

All documentation pages have a feedback button on the top right. This repository does not handle requests for REST API documentation changes.

github-actions[bot] commented 2 weeks ago

Hi @rkilburn, we deeply appreciate your input into this project. Regrettably, this issue has remained unresolved for over 2 years and inactive for 30 days, leading us to the decision to close it. We've implemented this policy to maintain the relevance of our issue queue and facilitate easier navigation for new contributors. If you still believe this topic requires attention, please feel free to create a new issue, referencing this one. Thank you for your understanding and ongoing support.