gettyimages / docker-spark

Docker build for Apache Spark
MIT License
679 stars 369 forks source link

spark-shell error reading parquet #56

Closed cjekal closed 5 years ago

cjekal commented 5 years ago

I just recently pulled the latest using gettyimages/spark:2.4.1-hadoop-3.0 and when I ran the following code, I received an obscure java.lang.IllegalArgumentException: Unsupported class file major version 56 error. Here's the full spark-shell session below:

root@master:/usr/spark-2.4.1# spark-shell
WARNING: An illegal reflective access operation has occurred
WARNING: Illegal reflective access by org.apache.spark.unsafe.Platform (file:/usr/spark-2.4.1/jars/spark-unsafe_2.11-2.4.1.jar) to method java.nio.Bits.unaligned()
WARNING: Please consider reporting this to the maintainers of org.apache.spark.unsafe.Platform
WARNING: Use --illegal-access=warn to enable warnings of further illegal reflective access operations
WARNING: All illegal access operations will be denied in a future release
2019-04-19 05:12:25,480 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
Spark context Web UI available at http://localhost:4040
Spark context available as 'sc' (master = spark://master:7077, app id = app-20190419051234-0001).
Spark session available as 'spark'.
Welcome to
      ____              __
     / __/__  ___ _____/ /__
    _\ \/ _ \/ _ `/ __/  '_/
   /___/ .__/\_,_/_/ /_/\_\   version 2.4.1
      /_/

Using Scala version 2.11.12 (Java HotSpot(TM) 64-Bit Server VM, Java 12.0.1)
Type in expressions to have them evaluated.
Type :help for more information.

scala> val df = Seq((1,2,3),(4,5,6)).toDF("a", "b", "c")
df: org.apache.spark.sql.DataFrame = [a: int, b: int ... 1 more field]

scala> df.show(false)
+---+---+---+
|a  |b  |c  |
+---+---+---+
|1  |2  |3  |
|4  |5  |6  |
+---+---+---+

scala> df.write.parquet("blah")

scala> val x = spark.read.parquet("blah")
java.lang.IllegalArgumentException: Unsupported class file major version 56
  at org.apache.xbean.asm6.ClassReader.<init>(ClassReader.java:166)
  at org.apache.xbean.asm6.ClassReader.<init>(ClassReader.java:148)
  at org.apache.xbean.asm6.ClassReader.<init>(ClassReader.java:136)
  at org.apache.xbean.asm6.ClassReader.<init>(ClassReader.java:237)
  at org.apache.spark.util.ClosureCleaner$.getClassReader(ClosureCleaner.scala:49)
  at org.apache.spark.util.FieldAccessFinder$$anon$3$$anonfun$visitMethodInsn$2.apply(ClosureCleaner.scala:517)
  at org.apache.spark.util.FieldAccessFinder$$anon$3$$anonfun$visitMethodInsn$2.apply(ClosureCleaner.scala:500)
  at scala.collection.TraversableLike$WithFilter$$anonfun$foreach$1.apply(TraversableLike.scala:733)
  at scala.collection.mutable.HashMap$$anon$1$$anonfun$foreach$2.apply(HashMap.scala:134)
  at scala.collection.mutable.HashMap$$anon$1$$anonfun$foreach$2.apply(HashMap.scala:134)
  at scala.collection.mutable.HashTable$class.foreachEntry(HashTable.scala:236)
  at scala.collection.mutable.HashMap.foreachEntry(HashMap.scala:40)
  at scala.collection.mutable.HashMap$$anon$1.foreach(HashMap.scala:134)
  at scala.collection.TraversableLike$WithFilter.foreach(TraversableLike.scala:732)
  at org.apache.spark.util.FieldAccessFinder$$anon$3.visitMethodInsn(ClosureCleaner.scala:500)
  at org.apache.xbean.asm6.ClassReader.readCode(ClassReader.java:2175)
  at org.apache.xbean.asm6.ClassReader.readMethod(ClassReader.java:1238)
  at org.apache.xbean.asm6.ClassReader.accept(ClassReader.java:631)
  at org.apache.xbean.asm6.ClassReader.accept(ClassReader.java:355)
  at org.apache.spark.util.ClosureCleaner$$anonfun$org$apache$spark$util$ClosureCleaner$$clean$14.apply(ClosureCleaner.scala:307)
  at org.apache.spark.util.ClosureCleaner$$anonfun$org$apache$spark$util$ClosureCleaner$$clean$14.apply(ClosureCleaner.scala:306)
  at scala.collection.immutable.List.foreach(List.scala:392)
  at org.apache.spark.util.ClosureCleaner$.org$apache$spark$util$ClosureCleaner$$clean(ClosureCleaner.scala:306)
  at org.apache.spark.util.ClosureCleaner$.clean(ClosureCleaner.scala:162)
  at org.apache.spark.SparkContext.clean(SparkContext.scala:2326)
  at org.apache.spark.SparkContext.runJob(SparkContext.scala:2100)
  at org.apache.spark.SparkContext.runJob(SparkContext.scala:2126)
  at org.apache.spark.rdd.RDD$$anonfun$collect$1.apply(RDD.scala:945)
  at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
  at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112)
  at org.apache.spark.rdd.RDD.withScope(RDD.scala:363)
  at org.apache.spark.rdd.RDD.collect(RDD.scala:944)
  at org.apache.spark.sql.execution.datasources.parquet.ParquetFileFormat$.mergeSchemasInParallel(ParquetFileFormat.scala:633)
  at org.apache.spark.sql.execution.datasources.parquet.ParquetFileFormat.inferSchema(ParquetFileFormat.scala:241)
  at org.apache.spark.sql.execution.datasources.DataSource$$anonfun$6.apply(DataSource.scala:180)
  at org.apache.spark.sql.execution.datasources.DataSource$$anonfun$6.apply(DataSource.scala:180)
  at scala.Option.orElse(Option.scala:289)
  at org.apache.spark.sql.execution.datasources.DataSource.getOrInferFileFormatSchema(DataSource.scala:179)
  at org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:373)
  at org.apache.spark.sql.DataFrameReader.loadV1Source(DataFrameReader.scala:223)
  at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:211)
  at org.apache.spark.sql.DataFrameReader.parquet(DataFrameReader.scala:641)
  at org.apache.spark.sql.DataFrameReader.parquet(DataFrameReader.scala:625)
  ... 49 elided

scala> 

Any ideas on what could have caused this? Is it b/c of the base imagecdebian:stretch without any tag qualifier?

brenoarosa commented 5 years ago

I'm having the same issue. gettyimages/spark:2.4.0-hadoop-3.0 works but the update to java 12 in 2.4.1 broke it.

Ftagn92 commented 5 years ago

Hello, Same here with pyspark typing :

df = sc.parallelize([1,2,3,4]) df.count()

Not a parquet issue but general issue i guess

bryceageno commented 5 years ago

I'll try out open jdk, since you now need a account to get java 8 jdk from oracle.

bryceageno commented 5 years ago

I tested both use cases on latest and did not get a error

makslevental commented 5 years ago

i just pulled and stood up using docker-compose and i get

pyspark.sql.utils.IllegalArgumentException: 'Unsupported class file major version 55'

(in python obv)

vishnukartheek commented 5 years ago

Hello, Same here with pyspark typing :

df = sc.parallelize([1,2,3,4]) df.count()

Not a parquet issue but general issue i guess

what is the solution?

EliasGoldberg commented 4 years ago

This is probably no longer relevant but I had this issue when I was using Java 11 with Scala 2.11.12. The solution is to downgrade to Java 8.

lizzyhuang commented 4 years ago

I still have the same issue, using the code df = sc.parallelize([1, 2, 3, 4]) df.count() this time the error is unsupported class file major version 56. My spark version is 2.4.4, with the default python version 2.7.16. My java version is 12.0.2 (came with my Mac OS system). My guess is my java version is too new for spark in this case? But how can I downgrade to Java 8? Will this mess up with my OS system? I'm using a local spark version directly on my Mac, so I think it's better that I can have the particular Java 8 just for spark but the newer Java for the rest of my system.

jazz-bee commented 4 years ago

did you find a solution @lizzyhuang ?

jonha892 commented 4 years ago

I had the same problem

java.lang.IllegalArgumentException: Unsupported class file major version 57

using:

Windows, java 13.0.2, scala 2.11.12, spark 2.4.5

I fixed it by downgrading to jdk 1.8 which I got from adopttoopenjdk.net. Then I adjusted the PATH/JAVA_HOME environment variable and the error was gone.

lizzyhuang commented 4 years ago

did you find a solution @lizzyhuang ?

@Jaz-B no, I have a Mac so I don't really know which version of Java to download tbh....I just gave up and did some other stuff.

lizzyhuang commented 4 years ago

I had the same problem

java.lang.IllegalArgumentException: Unsupported class file major version 57

using:

Windows, java 13.0.2, scala 2.11.12, spark 2.4.5

I fixed it by downgrading to jdk 1.8 which I got from adopttoopenjdk.net. Then I adjusted the PATH/JAVA_HOME environment variable and the error was gone.

Thank you. I'll try this on my Mac. Not sure whether I will need to change the path though...