Open cbardin opened 4 years ago
Also this is the version I am running; azure-cosmosdb-spark_2.4.0_2.11-3.1.0-uber.jar
and I'm on Databricks Runtime 6.5
I am seeing similar issue when using Azure data factory to write to the CosmosDb
{"StatusCode":"DFExecutorUserError","Message":"Job failed due to reason: Errors encountered in bulk update API execution. Number of failures corresponding to exception of type: java.lang.RuntimeException = 500; FAILURE: java.lang.RuntimeException: Stored proc returned failure 404\n\tat com.microsoft.azure.documentdb.bulkexecutor.internal.BatchUpdater$1.call(BatchUpdater.java:199)\n\tat com.microsoft.azure.documentdb.bulkexecutor.internal.BatchUpdater$1.call(BatchUpdater.java:148)\n\tat com.microsoft.azure.documentdb.repackaged.com.google.common.util.concurrent.TrustedListenableFutureTask$TrustedFutureInterruptibleTask.runInterruptibly(TrustedListenableFutureTask.java:125)\n\tat com.microsoft.azure.documentdb.repackaged.com.google.common.util.concurrent.InterruptibleTask.run(InterruptibleTask.java:57)\n\tat com.microsoft.azure.documentdb.repackaged.com.google.common.util.concurrent.TrustedListenableFutureTask.run(TrustedListenableFutureTask.java:78)\n\tat java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)\n\tat java.util.concurrent.ThreadPoo","Details":"Errors encountered in bulk update API execution. Number of failures corresponding to exception of type: java.lang.RuntimeException = 500; FAILURE: java.lang.RuntimeException: Stored proc returned failure 404\n\tat com.microsoft.azure.documentdb.bulkexecutor.internal.BatchUpdater$1.call(BatchUpdater.java:199)\n\tat com.microsoft.azure.documentdb.bulkexecutor.internal.BatchUpdater$1.call(BatchUpdater.java:148)\n\tat com.microsoft.azure.documentdb.repackaged.com.google.common.util.concurrent.TrustedListenableFutureTask$TrustedFutureInterruptibleTask.runInterruptibly(TrustedListenableFutureTask.java:125)\n\tat com.microsoft.azure.documentdb.repackaged.com.google.common.util.concurrent.InterruptibleTask.run(InterruptibleTask.java:57)\n\tat com.microsoft.azure.documentdb.repackaged.com.google.common.util.concurrent.TrustedListenableFutureTask.run(TrustedListenableFutureTask.java:78)\n\tat java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)\n\tat java.util.concurrent.ThreadPoolExecutor$Worker.run(Threa"}
Any lead on this? I'm facing a similar issue.
Please check that the partitionKey is correct. For me, the partitionkey in CosmosDb and the partitionKey configured in ADF sink did not match.
I'm attempting to use Databricks to write a dataframe to our CosmosDB, and I've done some searching around but nothing seems to work. I've set up code I've found on this GitHub as well as documentation from blogs on databricks or azure websites. This is what I have right now;
val configMap = Map( "Endpoint" -> "", "Masterkey" -> "", "Database" -> "", "Collection" -> "container1", "preferredRegions" -> "East US;West US;", "Upsert" -> "true") val config = Config(configMap)
val df = spark.range(5).select(col("id").cast("string").as("value"))
df.write.format("com.microsoft.azure.cosmosdb.spark").mode(SaveMode.Append).cosmosDB(config)
and I get an error similar to this every time;
org.apache.spark.SparkException: Job aborted due to stage failure: Task 7 in stage 2.0 failed 4 times, most recent failure: Lost task 7.3 in stage 2.0 (TID 64, 10.255.128.5, executor 0): java.lang.Exception: Errors encountered in bulk import API execution. PartitionKeyDefinition: {"paths":["/id"],"kind":"Hash"}, Number of failures corresponding to exception of type: com.microsoft.azure.documentdb.DocumentClientException = 1; FAILURE: com.microsoft.azure.documentdb.DocumentClientException: Max retries for BulkExecutor exhausted. Please re-initialize BulkExecutor and retry latest batch import. at com.microsoft.azure.documentdb.bulkexecutor.DocumentBulkExecutor.executeBulkImportInternal(DocumentBulkExecutor.java:603) at com.microsoft.azure.documentdb.bulkexecutor.DocumentBulkExecutor.importAll(DocumentBulkExecutor.java:505) at com.microsoft.azure.cosmosdb.spark.CosmosDBSpark$.bulkImport(CosmosDBSpark.scala:292) at com.microsoft.azure.cosmosdb.spark.CosmosDBSpark$.com$microsoft$azure$cosmosdb$spark$CosmosDBSpark$$savePartition(CosmosDBSpark.scala:415) at com.microsoft.azure.cosmosdb.spark.CosmosDBSpark$$anonfun$1.apply(CosmosDBSpark.scala:154) at com.microsoft.azure.cosmosdb.spark.CosmosDBSpark$$anonfun$1.apply(CosmosDBSpark.scala:154) at org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$anonfun$apply$23.apply(RDD.scala:830) at org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$anonfun$apply$23.apply(RDD.scala:830) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:60) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:353) at org.apache.spark.rdd.RDD.iterator(RDD.scala:317) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90) at org.apache.spark.scheduler.Task.doRunTask(Task.scala:140) at org.apache.spark.scheduler.Task.run(Task.scala:113) at org.apache.spark.executor.Executor$TaskRunner$$anonfun$13.apply(Executor.scala:537) at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1541) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:543) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) ; DOCUMENT FAILED TO IMPORT: ["46ae62b3-ff69-4cda-9529-e37cf4997c4c"](05C1D5DD45E38C0835376266373363342E6767373A2E356465622E3A36333A2E6634386467353A3A38643564
Any help would be appreciated.