Tried implicit writing and reading using parquet format, successfully able to write it in ADLSGen2,able to read the schema from dataframe but when try to read it throws NPE(null pointer exception) ,
Same entity can be able to write and read in csv format.
Error Log:
org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 47.0 failed 4 times, most recent failure: Lost task 0.3 in stage 47.0 (TID 1397, 10.139.64.8, executor 5): java.lang.NullPointerException
Caused by: java.lang.NullPointerException
at com.microsoft.cdm.read.ParquetReaderConnector.jsonToData(ParquetReaderConnector.scala:231)
at com.microsoft.cdm.read.CDMDataReader$$anonfun$get$3.apply(CDMDataReader.scala:85)
at com.microsoft.cdm.read.CDMDataReader$$anonfun$get$3.apply(CDMDataReader.scala:83)
at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
at scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:186)
at scala.collection.TraversableLike$class.map(TraversableLike.scala:234)
at scala.collection.mutable.ArrayOps$ofRef.map(ArrayOps.scala:186)
at com.microsoft.cdm.read.CDMDataReader.get(CDMDataReader.scala:83)
at com.microsoft.cdm.read.CDMDataReader.get(CDMDataReader.scala:19)
at org.apache.spark.sql.execution.datasources.v2.DataSourceRDD$$anon$1.next(DataSourceRDD.scala:59)
at org.apache.spark.InterruptibleIterator.next(InterruptibleIterator.scala:40)
Tried implicit writing and reading using parquet format, successfully able to write it in ADLSGen2,able to read the schema from dataframe but when try to read it throws NPE(null pointer exception) , Same entity can be able to write and read in csv format.
Implicit Writing Logic: (df_supplies.write.format("com.microsoft.cdm") .option("storage", storageAccountName) .option("manifestPath", container + "/test8/default.manifest.cdm.json") .option("entity", "supplies") .option("appId", appID) .option("appKey", appKey) .option("tenantId", tenantID) .option("format", "parquet") .option("compression", "gzip") .save())
Implicit Reading Logic: readDf = (spark.read.format("com.microsoft.cdm") .option("storage", storageAccountName) .option("manifestPath", container + "/test8/default.manifest.cdm.json") .option("entity", "supplies") .option("appId", appID) .option("appKey", appKey) .option("tenantId", tenantID) .load())
Error Log: org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 47.0 failed 4 times, most recent failure: Lost task 0.3 in stage 47.0 (TID 1397, 10.139.64.8, executor 5): java.lang.NullPointerException Caused by: java.lang.NullPointerException at com.microsoft.cdm.read.ParquetReaderConnector.jsonToData(ParquetReaderConnector.scala:231) at com.microsoft.cdm.read.CDMDataReader$$anonfun$get$3.apply(CDMDataReader.scala:85) at com.microsoft.cdm.read.CDMDataReader$$anonfun$get$3.apply(CDMDataReader.scala:83) at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) at scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33) at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:186) at scala.collection.TraversableLike$class.map(TraversableLike.scala:234) at scala.collection.mutable.ArrayOps$ofRef.map(ArrayOps.scala:186) at com.microsoft.cdm.read.CDMDataReader.get(CDMDataReader.scala:83) at com.microsoft.cdm.read.CDMDataReader.get(CDMDataReader.scala:19) at org.apache.spark.sql.execution.datasources.v2.DataSourceRDD$$anon$1.next(DataSourceRDD.scala:59) at org.apache.spark.InterruptibleIterator.next(InterruptibleIterator.scala:40)