Azure / spark-cdm-connector

MIT License
75 stars 32 forks source link

Implicit Parquet format reading from ADLSGen2 throws Null pointer exception #81

Open Prakash5Github opened 3 years ago

Prakash5Github commented 3 years ago

Tried implicit writing and reading using parquet format, successfully able to write it in ADLSGen2,able to read the schema from dataframe but when try to read it throws NPE(null pointer exception) , Same entity can be able to write and read in csv format.

Implicit Writing Logic: (df_supplies.write.format("com.microsoft.cdm") .option("storage", storageAccountName) .option("manifestPath", container + "/test8/default.manifest.cdm.json") .option("entity", "supplies") .option("appId", appID) .option("appKey", appKey) .option("tenantId", tenantID) .option("format", "parquet") .option("compression", "gzip") .save())

Implicit Reading Logic: readDf = (spark.read.format("com.microsoft.cdm") .option("storage", storageAccountName) .option("manifestPath", container + "/test8/default.manifest.cdm.json") .option("entity", "supplies") .option("appId", appID) .option("appKey", appKey) .option("tenantId", tenantID) .load())

Error Log: org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 47.0 failed 4 times, most recent failure: Lost task 0.3 in stage 47.0 (TID 1397, 10.139.64.8, executor 5): java.lang.NullPointerException Caused by: java.lang.NullPointerException at com.microsoft.cdm.read.ParquetReaderConnector.jsonToData(ParquetReaderConnector.scala:231) at com.microsoft.cdm.read.CDMDataReader$$anonfun$get$3.apply(CDMDataReader.scala:85) at com.microsoft.cdm.read.CDMDataReader$$anonfun$get$3.apply(CDMDataReader.scala:83) at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) at scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33) at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:186) at scala.collection.TraversableLike$class.map(TraversableLike.scala:234) at scala.collection.mutable.ArrayOps$ofRef.map(ArrayOps.scala:186) at com.microsoft.cdm.read.CDMDataReader.get(CDMDataReader.scala:83) at com.microsoft.cdm.read.CDMDataReader.get(CDMDataReader.scala:19) at org.apache.spark.sql.execution.datasources.v2.DataSourceRDD$$anon$1.next(DataSourceRDD.scala:59) at org.apache.spark.InterruptibleIterator.next(InterruptibleIterator.scala:40)