Azure / azure-cosmosdb-spark

Apache Spark Connector for Azure Cosmos DB
MIT License
199 stars 119 forks source link

Unable to Write or Read (un-encrypted) data from Cosmos DB Containers created via Cosmos DB SDK Encryption #476

Open ssharma444 opened 1 year ago

ssharma444 commented 1 year ago

We have an Encrypted Container, created using Cosmos DB SDK, https://learn.microsoft.com/en-us/azure/cosmos-db/how-to-always-encrypted?tabs=dotnet. We are able to read-write as suggested in the Hyperlink using Function App.

We are trying to read-write data from this same Container via Databricks (using Cosmos DB library 2.3.0), but unable to.

When Writing to this Container from Databricks, we get error: "CosmosHttpResponseError: (BadRequest) Message: {"Errors":["Collection has ClientEncryptionPolicy set, but the document to be written isn't encrypted."]}"

While Reading from this Container from Databricks, we are able to read the data, but it comes back encrypted (it should come back decrypted / plain-text).

We have an open ticket with Microsoft on this, and upon investigation found out that this Library (Cosmos DB library 2.3.0) had no connection available in DLL.

My question is: When Databricks is offered as a service to interact with Cosmos DB, why are all the Features (available via Cosmos DB SDK), not available in Databricks Libraries?

TheovanKraay commented 1 year ago

Hello @ssharma444. You are correct that not all Cosmos DB features are supported in all Cosmos DB client libraries. In this case, demand for the feature was much higher in our SDKs than in our Spark Connector, where demand for this feature has been very low. With that said, this is still on our backlog, and currently our plan is to deliver encryption support for the Spark 3 Connector within the next 3-6 months.

Meanwhile, I want to point out that encryption support will not be delivered in the version of the Spark Connector that this repo relates to. This is the repo for the older Cosmos Spark Connector for Spark 2 (note: Spark 2 itself will soon be end of life on Databricks). This connector and is also based on deprecated versions of our Java SDK.

We strongly recommend upgrading to the Spark 3 OLTP Connector if possible, as this is where we will be adding encryption support. Feel free to raise this issue in the repo where our latest Spark Connector lives: https://github.com/Azure/azure-sdk-for-java/tree/main/sdk/cosmos/azure-cosmos-spark_3-2_2-12. We will track that issue against our work items for completion of this feature.