This repository is for active development of the Azure SDK for Java. For consumers of the SDK we recommend visiting our public developer docs at https://docs.microsoft.com/java/azure/ or our versioned developer docs at https://azure.github.io/azure-sdk-for-java.
MIT License
2.36k
stars
2k
forks
source link
[BUG] Azure Cosmos DB OLTP Spark 3 connector not dealing at all with large partition key support #39194
Create a Cosmos DB container through the Azure portal with default options
Verify on the portal that under the container's settings the label "Large partition key has been enabled" is shown
Verify that in the exported ARM template of the database you see {"kind": "Hash", "version": 2} under the container's partition key properties - ref Stack Overflow article to tell if a Cosmos DB uses a large partition key
Create a Cosmos DB container through the Azure Cosmos DB OLTP Spark 3 connector with default options
Verify on the portal that under the container's settings there's no label "Large partition key has been enabled"
Verify that in the exported ARM template of the database you just see {"kind": "Hash"} without any version field under the container's partition key properties.
Expected behavior
Either the Azure Cosmos DB OLTP Spark 3 connector provides an option to enable a large partition key on creation or enables it automatically like Azure. I wasn't able to find any reference either in the Catalog API or Configuration Reference docs. Also, I highly recommend to update the Live Migrate Azure Cosmos DB SQL API Containers data with Spark Connector and Azure Databricks article too as that was my starting point and did not mention anything about large partition keys (whereas I remind Azure automatically enables the support to large partition keys on new containers created through the portal).
Describe the bug The Azure Cosmos DB OLTP Spark 3 connector neither provides the option to enable a large partition key on container creation nor enables it silently by default as Azure does (see Azure portal UI changed and no more provides the option to enable large partition key for your reference). This was very hard to spot and caused extremely bad performances.
To Reproduce Steps to reproduce the behavior:
{"kind": "Hash", "version": 2}
under the container's partition key properties - ref Stack Overflow article to tell if a Cosmos DB uses a large partition key{"kind": "Hash"}
without anyversion
field under the container's partition key properties.Expected behavior Either the Azure Cosmos DB OLTP Spark 3 connector provides an option to enable a large partition key on creation or enables it automatically like Azure. I wasn't able to find any reference either in the Catalog API or Configuration Reference docs. Also, I highly recommend to update the Live Migrate Azure Cosmos DB SQL API Containers data with Spark Connector and Azure Databricks article too as that was my starting point and did not mention anything about large partition keys (whereas I remind Azure automatically enables the support to large partition keys on new containers created through the portal).