I have constructed my own local dev without docker as on some machines Docker doesn't work due to missing Bios enabled virtualization support dependency.
So I have build all jars from this repo:
594 jars count
I downloaded spark-3.1.1-amzn-0-bin-3.2.1-amzn-3 which hase its own jars:
273
I added aws-glue-libs jars into driver and executor class path in spark-defaults.con as bellow:
values = [("k1", 1), ("k2", 2)]
columns = ["key", "value"]
df = spark.createDataFrame(values, columns)
print('dataframe created')
logging.info('dataframe head - {}'.format(df.head()))
# write the dataframe as csv to s3.
df.write.csv("s3a://bucket/source.csv")
and I hit this error on calling df.:
Spark session created
dataframe created
22/01/19 17:47:06 WARN MetricsConfig: Cannot locate configuration: tried hadoop-metrics2-s3a-file-system.properties,hadoop-metrics2.properties
127.0.0.1 - - [19/Jan/2022 17:47:06] "HEAD /bucket/ HTTP/1.1" 200 -
An error occurred while calling o57.csv.
: java.lang.NoSuchMethodError: com.google.common.base.Preconditions.checkArgument(ZLjava/lang/String;Ljava/lang/Object;Ljava/lang/Object;)V
at org.apache.hadoop.fs.s3a.S3AUtils.lookupPassword(S3AUtils.java:816)
at org.apache.hadoop.fs.s3a.S3AUtils.lookupPassword(S3AUtils.java:792)
at org.apache.hadoop.fs.s3a.S3AUtils.getEncryptionAlgorithm(S3AUtils.java:1426)
at org.apache.hadoop.fs.s3a.S3AFileSystem.initialize(S3AFileSystem.java:316)
at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:3358)
at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:123)
at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:3407)
at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:3375)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:486)
at org.apache.hadoop.fs.Path.getFileSystem(Path.java:365)
at org.apache.spark.sql.execution.datasources.DataSource.planForWritingFileFormat(DataSource.scala:469)
at org.apache.spark.sql.execution.datasources.DataSource.planForWriting(DataSource.scala:569)
at org.apache.spark.sql.DataFrameWriter.saveToV1Source(DataFrameWriter.scala:438)
at org.apache.spark.sql.DataFrameWriter.saveInternal(DataFrameWriter.scala:415)
at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:293)
at org.apache.spark.sql.DataFrameWriter.csv(DataFrameWriter.scala:979)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
at py4j.Gateway.invoke(Gateway.java:282)
at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
at py4j.commands.CallCommand.execute(CallCommand.java:79)
at py4j.GatewayConnection.run(GatewayConnection.java:238)
at java.lang.Thread.run(Thread.java:748)
I have constructed my own local dev without docker as on some machines Docker doesn't work due to missing Bios enabled virtualization support dependency.
So I have build all jars from this repo: 594 jars count
I downloaded spark-3.1.1-amzn-0-bin-3.2.1-amzn-3 which hase its own jars: 273
I added aws-glue-libs jars into driver and executor class path in spark-defaults.con as bellow:
Now I wanted to test write and read of dataframe into moto (AWS mock s3 - https://github.com/spulec/moto) as this:
and I hit this error on calling df.:
By searching what is this related to I have found out this issue: https://issues.apache.org/jira/browse/HIVE-22915
Finally I found I have 2 guava files:
I fixed my issue by removing guava-14.0.1 from classpath, but should be fixed upstream I think, or?
Thanks for feedback