Azure / azure-cosmosdb-spark

Apache Spark Connector for Azure Cosmos DB
MIT License
201 stars 120 forks source link

add support for option for preserving null values #419

Closed moderakh closed 3 years ago

moderakh commented 3 years ago

prior to this PR null value in the data was dropped in write. This PR adds support for option preserveNullInWrite to preserve null values. The default is false to keep the old behaviour by default.

if we read data with null value columns from source container when writing to dest container we should support preserving null values.

This addresses this ICM: https://portal.microsofticm.com/imp/v3/incidents/details/211857363/home

val srCFG = Map("Endpoint" -> "https://test.documents.azure.com:443/",
  "Masterkey" -> "XYZ",
  "Database" -> "testdb",
  "Collection" -> "srccol")

val destCfg = Map("Endpoint" -> "https:/test.documents.azure.com:443/",
  "Masterkey" -> "XYZ",
  "Database" -> "testdb",
  "Collection" -> "testcol",
  "upsert" -> "true",
  "preserveNullInWrite" -> "true"
)

val spark = SparkSession.builder()
  .appName("spark connector sample")
  .master("local")
  .getOrCreate()

val df = spark.read
  .format("com.microsoft.azure.cosmosdb.spark")
  .options {
    srCFG
  }
  .load()

df.write.format("com.microsoft.azure.cosmosdb.spark").mode("append").options {
  destCfg
}.save()