Azure / azure-cosmosdb-spark

Apache Spark Connector for Azure Cosmos DB
MIT License
201 stars 120 forks source link

Snapshot Isolation Guarantees #464

Closed knarayanan88 closed 2 years ago

knarayanan88 commented 2 years ago

Question about the CosmosDB Spark Connector -

Let's say my cosmosDB collection has 500 partitions and I start a full table export via the CosmosDB Spark Connector (azure-cosmosdb-spark_2.4.0_2.11) at time t1. Assuming the export takes 3 hrs to complete, at time (t1 + 3)hrs, will the export contain data as of time t1 (when the export job started) or also contains rolling updates to the partitions during the time interval t1 to (t1+3) ? In other words, Will the export contain data from all 500 partitions as of time t1 or export can contain data from partitions at different points in time after the export started?

FabianMeiswinkel commented 2 years ago

The latter - it is eventually consistent - there is no snapshot isolation guarantee.