Azure / azure-cosmosdb-spark

Apache Spark Connector for Azure Cosmos DB
MIT License
199 stars 119 forks source link

CosmosDB inserts are very slow (days to load 100GB) #488

Open aaronS7 opened 4 months ago

aaronS7 commented 4 months ago

As the title suggests, the spark library is too slow for our use case. I understand a physical partition maxes out at 10k RU/s, but even when I over provision my RU/s, it still takes a long time to load data into cosmosDB. Is there a way to force cosmosDB to be more aggressive when splitting data across physical partitions? This is ahuge bottleneck for my team