Azure / azure-cosmosdb-spark

Apache Spark Connector for Azure Cosmos DB
MIT License
199 stars 119 forks source link

How to delete a specific cosmosdb document or batch deletes throught the library? #443

Open zhong0x opened 3 years ago

aczelandi commented 2 years ago

@zhong0x - That's an interesting question, I was wondering about this myself. The only workaround I found was to use the Time To Live (TTL) capability of Cosmos DB. Basically by enabling TTL on the container level and setting the 'ttl' field of an existing item to a very small value, will do the trick.

To enable TTL on container level follow this tutorial.

Afterwards you could read the records that you want to delete using:

val recordsToDelete = spark.read.cosmosDB(yourCosmosDbReadConfig)

Then you could force the value of TTL to 1 and write the records back.

val recordsMarkedForDeletion = recordsToDelete.withColumn("ttl", lit(1)) recordsMarkedForDeletion.write.mode(SaveMode.Overwrite).cosmosDB(yourCosmosDbWriteConfig)

The above snippet will ensure the deletion of the selected items.