Reduces the performance overhead when a Spark DataFrame as many partitions - especially when using Cosmos DB as a sink in Spark Streaming scenarios

Azure / azure-cosmosdb-spark

Apache Spark Connector for Azure Cosmos DB

MIT License

201 stars 120 forks source link

Reduces the performance overhead when a Spark DataFrame as many partitions - especially when using Cosmos DB as a sink in Spark Streaming scenarios #439

Closed FabianMeiswinkel closed 3 years ago

FabianMeiswinkel commented 3 years ago

The change is possible now because we had introduced the CosmosDBConnectionCache - so we only need to initialize a single CosmosClient (with the metadata requests impacting master RU budget) per executor and follow a singleton pattern otherwise.