Azure / azure-cosmosdb-spark

Apache Spark Connector for Azure Cosmos DB
MIT License
202 stars 121 forks source link

Unable to write dataframe to Cosmos DB. Continuous loop - of Delaying operation by nnns to stagger partitions #398

Open codepossible opened 4 years ago

codepossible commented 4 years ago

I wanted if somebody else has run into this issue before and has suggestions to resolve it.

I am trying to write a Spark data frame, which is a result of the join of couple of data frames into Cosmos DB collection using Cosmos DB Spark Driver.

The code looks something like this:

  val mainDocument = df1
        .drop("col1", "col2")
                        .join(
                          df1.drop("col3, "col4"),
                          "joining_col")

    mainDocument.show(2, truncate=false)
    mainDocument.write.mode("Overwrite").format("parquet").save(getBasePath + "/screen/view.parquet")
    mainDocument.write.mode(SaveMode.Overwrite).cosmosDB(config)  // <-- Stuck here

Upon starting the write, it is constantly stuck in a loop and spits out the following log message:

INFO CosmosDBSpark: Delaying operation by 15s to stagger partitions.

This continuously repeats itself with varying degree of number of seconds, sometime little higher or little lower than 15 seconds.

I have written out the output to the parquet file on the file system and result is as as expected. Interestingly the Cosmos DB driver is still stuck with the data frame has zero rows.

The Cosmos DB configuration is pretty much standard. It is set at 400 RUs. And with test day, there are only updates are under 10.

Any insights or suggestion would be extremely helpful. Thanks in advance!