Azure / azure-sdk-for-java

This repository is for active development of the Azure SDK for Java. For consumers of the SDK we recommend visiting our public developer docs at https://docs.microsoft.com/java/azure/ or our versioned developer docs at https://azure.github.io/azure-sdk-for-java.
MIT License
2.36k stars 2k forks source link

[BUG] Azure Cosmos DB OLTP Spark 3 connector not dealing at all with large partition key support #39194

Open fabrideci opened 8 months ago

fabrideci commented 8 months ago

Describe the bug The Azure Cosmos DB OLTP Spark 3 connector neither provides the option to enable a large partition key on container creation nor enables it silently by default as Azure does (see Azure portal UI changed and no more provides the option to enable large partition key for your reference). This was very hard to spot and caused extremely bad performances.

To Reproduce Steps to reproduce the behavior:

  1. Create a Cosmos DB container through the Azure portal with default options
  2. Verify on the portal that under the container's settings the label "Large partition key has been enabled" is shown
  3. Verify that in the exported ARM template of the database you see {"kind": "Hash", "version": 2} under the container's partition key properties - ref Stack Overflow article to tell if a Cosmos DB uses a large partition key
  4. Create a Cosmos DB container through the Azure Cosmos DB OLTP Spark 3 connector with default options
  5. Verify on the portal that under the container's settings there's no label "Large partition key has been enabled"
  6. Verify that in the exported ARM template of the database you just see {"kind": "Hash"} without any version field under the container's partition key properties.

Expected behavior Either the Azure Cosmos DB OLTP Spark 3 connector provides an option to enable a large partition key on creation or enables it automatically like Azure. I wasn't able to find any reference either in the Catalog API or Configuration Reference docs. Also, I highly recommend to update the Live Migrate Azure Cosmos DB SQL API Containers data with Spark Connector and Azure Databricks article too as that was my starting point and did not mention anything about large partition keys (whereas I remind Azure automatically enables the support to large partition keys on new containers created through the portal).

github-actions[bot] commented 8 months ago

Thanks for the feedback! We are routing this to the appropriate team for follow-up. cc @kushagraThapar @pjohari-ms @TheovanKraay.

kushagraThapar commented 8 months ago

@xinlian12 and @tvaron3 - please take a look at this.