awslabs / aws-glue-libs

AWS Glue Libraries are additions and enhancements to Spark for ETL operations.
Other
635 stars 300 forks source link

Moto S3 - java.lang.NoSuchMethodError: com.google.common.base.Preconditions.checkArgument #123

Open archenroot opened 2 years ago

archenroot commented 2 years ago

I have constructed my own local dev without docker as on some machines Docker doesn't work due to missing Bios enabled virtualization support dependency.

So I have build all jars from this repo: 594 jars count

I downloaded spark-3.1.1-amzn-0-bin-3.2.1-amzn-3 which hase its own jars: 273

I added aws-glue-libs jars into driver and executor class path in spark-defaults.con as bellow:

spark.driver.extraClassPath /data/devel/sdk/spark-3.1.1-amzn-0-bin-3.2.1-amzn-3/jars/*:/data/devel/sdk/spark-3.1.1-amzn-0-bin-3.2.1-amzn-3/aws_glue_jars/*
spark.executor.extraClassPath   /data/devel/sdk/spark-3.1.1-amzn-0-bin-3.2.1-amzn-3/jars/*:/data/devel/sdk/spark-3.1.1-amzn-0-bin-3.2.1-amzn-3/aws_glue_jars/*

Now I wanted to test write and read of dataframe into moto (AWS mock s3 - https://github.com/spulec/moto) as this:

values = [("k1", 1), ("k2", 2)]
    columns = ["key", "value"]
    df = spark.createDataFrame(values, columns)
    print('dataframe created')
    logging.info('dataframe head - {}'.format(df.head()))
    # write the dataframe as csv to s3.
    df.write.csv("s3a://bucket/source.csv")

and I hit this error on calling df.:

Spark session created
dataframe created
22/01/19 17:47:06 WARN MetricsConfig: Cannot locate configuration: tried hadoop-metrics2-s3a-file-system.properties,hadoop-metrics2.properties
127.0.0.1 - - [19/Jan/2022 17:47:06] "HEAD /bucket/ HTTP/1.1" 200 -
An error occurred while calling o57.csv.
: java.lang.NoSuchMethodError: com.google.common.base.Preconditions.checkArgument(ZLjava/lang/String;Ljava/lang/Object;Ljava/lang/Object;)V
        at org.apache.hadoop.fs.s3a.S3AUtils.lookupPassword(S3AUtils.java:816)
        at org.apache.hadoop.fs.s3a.S3AUtils.lookupPassword(S3AUtils.java:792)
        at org.apache.hadoop.fs.s3a.S3AUtils.getEncryptionAlgorithm(S3AUtils.java:1426)
        at org.apache.hadoop.fs.s3a.S3AFileSystem.initialize(S3AFileSystem.java:316)
        at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:3358)
        at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:123)
        at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:3407)
        at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:3375)
        at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:486)
        at org.apache.hadoop.fs.Path.getFileSystem(Path.java:365)
        at org.apache.spark.sql.execution.datasources.DataSource.planForWritingFileFormat(DataSource.scala:469)
        at org.apache.spark.sql.execution.datasources.DataSource.planForWriting(DataSource.scala:569)
        at org.apache.spark.sql.DataFrameWriter.saveToV1Source(DataFrameWriter.scala:438)
        at org.apache.spark.sql.DataFrameWriter.saveInternal(DataFrameWriter.scala:415)
        at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:293)
        at org.apache.spark.sql.DataFrameWriter.csv(DataFrameWriter.scala:979)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
        at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
        at py4j.Gateway.invoke(Gateway.java:282)
        at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
        at py4j.commands.CallCommand.execute(CallCommand.java:79)
        at py4j.GatewayConnection.run(GatewayConnection.java:238)
        at java.lang.Thread.run(Thread.java:748)

By searching what is this related to I have found out this issue: https://issues.apache.org/jira/browse/HIVE-22915

Finally I found I have 2 guava files:

[17:54:36] zangetsu@zeus  $  /data/devel/sdk/spark-3.1.1-amzn-0-bin-3.2.1-amzn-3  find . -name '*guava*'
./jars/guava-14.0.1.jar - origina aws distribution
./aws_glue_jars/guava-21.0.jar - aws-glue-libs repo jar

I fixed my issue by removing guava-14.0.1 from classpath, but should be fixed upstream I think, or?

Thanks for feedback