I wanted if somebody else has run into this issue before and has suggestions to resolve it.
I am trying to write a Spark data frame, which is a result of the join of couple of data frames into Cosmos DB collection using Cosmos DB Spark Driver.
The code looks something like this:
val mainDocument = df1
.drop("col1", "col2")
.join(
df1.drop("col3, "col4"),
"joining_col")
mainDocument.show(2, truncate=false)
mainDocument.write.mode("Overwrite").format("parquet").save(getBasePath + "/screen/view.parquet")
mainDocument.write.mode(SaveMode.Overwrite).cosmosDB(config) // <-- Stuck here
Upon starting the write, it is constantly stuck in a loop and spits out the following log message:
INFO CosmosDBSpark: Delaying operation by 15s to stagger partitions.
This continuously repeats itself with varying degree of number of seconds, sometime little higher or little lower than 15 seconds.
I have written out the output to the parquet file on the file system and result is as as expected. Interestingly the Cosmos DB driver is still stuck with the data frame has zero rows.
The Cosmos DB configuration is pretty much standard. It is set at 400 RUs. And with test day, there are only updates are under 10.
Any insights or suggestion would be extremely helpful. Thanks in advance!
I wanted if somebody else has run into this issue before and has suggestions to resolve it.
I am trying to write a Spark data frame, which is a result of the join of couple of data frames into Cosmos DB collection using Cosmos DB Spark Driver.
The code looks something like this:
Upon starting the write, it is constantly stuck in a loop and spits out the following log message:
INFO CosmosDBSpark: Delaying operation by 15s to stagger partitions.
This continuously repeats itself with varying degree of number of seconds, sometime little higher or little lower than 15 seconds.
I have written out the output to the parquet file on the file system and result is as as expected. Interestingly the Cosmos DB driver is still stuck with the data frame has zero rows.
The Cosmos DB configuration is pretty much standard. It is set at 400 RUs. And with test day, there are only updates are under 10.
Any insights or suggestion would be extremely helpful. Thanks in advance!