Closed erparas closed 5 years ago
In case you still have this problem, or someone else encounters it: I´ve had the same issue while using spark2 with spark-avro_2.11:3.0.0. However, when I switched to 4.0.0, the issue was fixed. I´ve started the spark2-shell like this:
spark2-shell --packages com.databricks:spark-avro_2.11:4.0.0
Running the following code (without problems)
import com.databricks.spark.avro._
spark.read.avro("PATH_TO_AVROFILES").show()
Yes, the newer version of spark-avro jar did resolve the issue.
Thanks.
Hi team,
I am trying to read an avro data file as spark dataframe but it is throwing null pointerexception. I have enabled kryo as serialzer and below are the details :
Code snippet : Dataset table= sparkSessionObject.read().format("com.databricks.spark.avro").load("/tmp/table");
table.show();
Note: when I use the javaSerializer it works fine.
Jar version details - spark-sql_2.11-2.1.0.cloudera1.jar spark-avro_2.11-3.2.0.jar kryo-shaded-3.0.3.jar
Stacktrace :
Caused by: java.lang.NullPointerException at com.databricks.spark.avro.DefaultSource$$anonfun$buildReader$1.apply(DefaultSource.scala:170) at com.databricks.spark.avro.DefaultSource$$anonfun$buildReader$1.apply(DefaultSource.scala:160) at org.apache.spark.sql.execution.datasources.FileFormat$$anon$1.apply(FileFormat.scala:138) at org.apache.spark.sql.execution.datasources.FileFormat$$anon$1.apply(FileFormat.scala:122) at org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.nextIterator(FileScanRDD.scala:168) at org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.hasNext(FileScanRDD.scala:109) at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator.processNext(Unknown Source) at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43) at org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$8$$anon$1.hasNext(WholeStageCodegenExec.scala:377) at org.apache.spark.sql.execution.SparkPlan$$anonfun$2.apply(SparkPlan.scala:231) at org.apache.spark.sql.execution.SparkPlan$$anonfun$2.apply(SparkPlan.scala:225) at org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$25.apply(RDD.scala:826) at org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$25.apply(RDD.scala:826) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323) at org.apache.spark.rdd.RDD.iterator(RDD.scala:287) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87) at org.apache.spark.scheduler.Task.run(Task.scala:99) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:322) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:748)
Could you please help me out on this.