Closed jesusch closed 2 years ago
Don't bother with the glue-3.0 branch....even with the pyspark "binary" (its really just a script) which I copied from the EMR6.3 base image...it's still not working.
Spark starts then this gets thrown..as soon as it hits glueContext = GlueContext(sc)
Exception in thread "Thread-6" java.lang.NoClassDefFoundError: com/amazonaws/services/lakeformation/model/CommitTransactionResult at java.lang.Class.getDeclaredMethods0(Native Method) at java.lang.Class.privateGetDeclaredMethods(Class.java:2701) at java.lang.Class.privateGetPublicMethods(Class.java:2902) at java.lang.Class.getMethods(Class.java:1615) at py4j.reflection.ReflectionEngine.getMethodsByNameAndLength(ReflectionEngine.java:345) at py4j.reflection.ReflectionEngine.getMethod(ReflectionEngine.java:305) at py4j.reflection.ReflectionEngine.getMethod(ReflectionEngine.java:326) at py4j.Gateway.invoke(Gateway.java:274) at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132) at py4j.commands.CallCommand.execute(CallCommand.java:79) at py4j.GatewayConnection.run(GatewayConnection.java:238) at java.lang.Thread.run(Thread.java:748) Caused by: java.lang.ClassNotFoundException: com.amazonaws.services.lakeformation.model.CommitTransactionResult at java.net.URLClassLoader.findClass(URLClassLoader.java:381) at java.lang.ClassLoader.loadClass(ClassLoader.java:424) at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:349) at java.lang.ClassLoader.loadClass(ClassLoader.java:357) ... 12 more ERROR:root:Exception while sending command.
while trying to pass a python job via gluesparksubmit test.py --JOB_NAME test
I tried openjdk8 and openjdk11
Would be good to get any idea what java version should be used?
Traceback (most recent call last):
File "/Users/jesusch/git/aws-glue-libs/test.py", line 11, in <module>
sc = SparkContext()
File "/Users/jesusch/Downloads/spark-3.1.1-amzn-0-bin-3.2.1-amzn-3/python/lib/pyspark.zip/pyspark/context.py", line 146, in __init__
File "/Users/jesusch/Downloads/spark-3.1.1-amzn-0-bin-3.2.1-amzn-3/python/lib/pyspark.zip/pyspark/context.py", line 209, in _do_init
File "/Users/jesusch/Downloads/spark-3.1.1-amzn-0-bin-3.2.1-amzn-3/python/lib/pyspark.zip/pyspark/context.py", line 329, in _initialize_context
File "/Users/jesusch/Downloads/spark-3.1.1-amzn-0-bin-3.2.1-amzn-3/python/lib/py4j-0.10.9-src.zip/py4j/java_gateway.py", line 1568, in __call__
File "/Users/jesusch/Downloads/spark-3.1.1-amzn-0-bin-3.2.1-amzn-3/python/lib/py4j-0.10.9-src.zip/py4j/protocol.py", line 326, in get_return_value
py4j.protocol.Py4JJavaError: An error occurred while calling None.org.apache.spark.api.java.JavaSparkContext.
: java.lang.NoSuchMethodError: 'void io.netty.util.concurrent.SingleThreadEventExecutor.<init>(io.netty.util.concurrent.EventExecutorGroup, java.util.concurrent.Executor, boolean, java.util.Queue, io.netty.util.concurrent.RejectedExecutionHandler)'
at io.netty.channel.SingleThreadEventLoop.<init>(SingleThreadEventLoop.java:65)
at io.netty.channel.nio.NioEventLoop.<init>(NioEventLoop.java:138)
at io.netty.channel.nio.NioEventLoopGroup.newChild(NioEventLoopGroup.java:146)
at io.netty.channel.nio.NioEventLoopGroup.newChild(NioEventLoopGroup.java:37)
at io.netty.util.concurrent.MultithreadEventExecutorGroup.<init>(MultithreadEventExecutorGroup.java:84)
at io.netty.util.concurrent.MultithreadEventExecutorGroup.<init>(MultithreadEventExecutorGroup.java:58)
at io.netty.util.concurrent.MultithreadEventExecutorGroup.<init>(MultithreadEventExecutorGroup.java:47)
at io.netty.channel.MultithreadEventLoopGroup.<init>(MultithreadEventLoopGroup.java:59)
at io.netty.channel.nio.NioEventLoopGroup.<init>(NioEventLoopGroup.java:86)
at io.netty.channel.nio.NioEventLoopGroup.<init>(NioEventLoopGroup.java:81)
at io.netty.channel.nio.NioEventLoopGroup.<init>(NioEventLoopGroup.java:68)
at org.apache.spark.network.util.NettyUtils.createEventLoop(NettyUtils.java:66)
at org.apache.spark.network.client.TransportClientFactory.<init>(TransportClientFactory.java:106)
at org.apache.spark.network.TransportContext.createClientFactory(TransportContext.java:142)
at org.apache.spark.rpc.netty.NettyRpcEnv.<init>(NettyRpcEnv.scala:77)
at org.apache.spark.rpc.netty.NettyRpcEnvFactory.create(NettyRpcEnv.scala:493)
at org.apache.spark.rpc.RpcEnv$.create(RpcEnv.scala:57)
at org.apache.spark.SparkEnv$.create(SparkEnv.scala:266)
at org.apache.spark.SparkEnv$.createDriverEnv(SparkEnv.scala:189)
at org.apache.spark.SparkContext.createSparkEnv(SparkContext.scala:277)
at org.apache.spark.SparkContext.<init>(SparkContext.scala:458)
at org.apache.spark.api.java.JavaSparkContext.<init>(JavaSparkContext.scala:58)
at java.base/jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at java.base/jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at java.base/jdk.internal.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.base/java.lang.reflect.Constructor.newInstance(Constructor.java:490)
at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:247)
at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
at py4j.Gateway.invoke(Gateway.java:238)
at py4j.commands.ConstructorCommand.invokeConstructor(ConstructorCommand.java:80)
at py4j.commands.ConstructorCommand.execute(ConstructorCommand.java:69)
at py4j.GatewayConnection.run(GatewayConnection.java:238)
at java.base/java.lang.Thread.run(Thread.java:829)
Well if you dump the java version in a Glue 3.0 script this is what it spits out....
b'openjdk version "1.8.0_282"\nOpenJDK Runtime Environment (build 1.8.0_282-b08)\nOpenJDK 64-Bit Server VM (build 25.282-b08, mixed mode)\n'
so openjdk8 seems to be what they are using on the workers
When I click the link to either of the references s3 objects, I get:
<Error>
<Code>NoSuchKey</Code>
<Message>The specified key does not exist.</Message>
<Key>glue-3.0/spark-3.1.1-amzn-0-bin-3.2.1-amzn-3</Key>
<RequestId>TZ094KCPBAHFN4W0</RequestId>
<HostId>w9M8wLBNkGI7lZ7oSHdaXJE0uUr1Z8mTHDnClPi0hOxZOvG6ckS3m20ccKMzKOZaha8xlFLf0ZQ=</HostId>
</Error>
Eg: https://aws-glue-etl-artifacts.s3.amazonaws.com/glue-3.0/spark-3.1.1-amzn-0-bin-3.2.1-amzn-3
When I click the link to either of the references s3 objects, I get:
<Error> <Code>NoSuchKey</Code> <Message>The specified key does not exist.</Message> <Key>glue-3.0/spark-3.1.1-amzn-0-bin-3.2.1-amzn-3</Key> <RequestId>TZ094KCPBAHFN4W0</RequestId> <HostId>w9M8wLBNkGI7lZ7oSHdaXJE0uUr1Z8mTHDnClPi0hOxZOvG6ckS3m20ccKMzKOZaha8xlFLf0ZQ=</HostId> </Error>
Eg: https://aws-glue-etl-artifacts.s3.amazonaws.com/glue-3.0/spark-3.1.1-amzn-0-bin-3.2.1-amzn-3 Append .tgz, i.e.
https://aws-glue-etl-artifacts.s3.amazonaws.com/glue-3.0/spark-3.1.1-amzn-0-bin-3.2.1-amzn-3.tgz
Thank you for reporting the issue.
We have updated the following tarball to include pyspark package. https://aws-glue-etl-artifacts.s3.amazonaws.com/glue-3.0/spark-3.1.1-amzn-0-bin-3.2.1-amzn-3.tgz We also fixed the README.md with the correct URL for the package.
Please let us know if you still see the issues. For new issues which is different from pyspark packaging, it would be great if you can create a separate issue.
Let us keep this issue open for a while to see if there are any additional issues.
I double checked that the original issue has been resolved. Closing.
I need Glue 3 to investigate some Governed Tables operations.
I followed the instructions for Developing Locally with Python, https://docs.aws.amazon.com/glue/latest/dg/aws-glue-programming-etl-libraries.html
Only when I run ./bin/gluepyspark, it fails with the error below. I see someone in this thread had a similar issue.
I'm using Oracle Java 1.8.0_311. Would OpenJDK be a better choice? And I got the latest files from the link above for Maven and Spark. I also tried the Glue ETL file uploaded by moomindani.
Any suggestions on how to resolve this problem?
[INFO] BUILD SUCCESS [INFO] ------------------------------------------------------------------------ [INFO] Total time: 3.041 s [INFO] Finished at: 2021-11-09T17:59:53-05:00 [INFO] ------------------------------------------------------------------------ mkdir: /Users/cbishop/dev/aws-glue-libs/conf: File exists /Users/cbishop/dev/aws-glue-libs Python 3.9.7 (default, Nov 9 2021, 08:38:13) [Clang 13.0.0 (clang-1300.0.29.3)] on darwin Type "help", "copyright", "credits" or "license" for more information. SLF4J: Class path contains multiple SLF4J bindings. SLF4J: Found binding in [jar:file:/Users/cbishop/dev/aws-glue-libs/jarsv1/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: Found binding in [jar:file:/Users/cbishop/spark-3.1.1-amzn-0-bin-3.2.1-amzn-3/jars/slf4j-log4j12-1.7.30.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation. SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory] 21/11/09 17:59:57 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties Setting default log level to "WARN". To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel). 21/11/09 17:59:57 WARN SparkContext: Another SparkContext is being constructed (or threw an exception in its constructor). This may indicate an error, since only one SparkContext should be running in this JVM (see SPARK-2243). The other SparkContext was created at: org.apache.spark.api.java.JavaSparkContext.<init>(JavaSparkContext.scala:58) sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) java.lang.reflect.Constructor.newInstance(Constructor.java:423) py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:247) py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357) py4j.Gateway.invoke(Gateway.java:238) py4j.commands.ConstructorCommand.invokeConstructor(ConstructorCommand.java:80) py4j.commands.ConstructorCommand.execute(ConstructorCommand.java:69) py4j.GatewayConnection.run(GatewayConnection.java:238) java.lang.Thread.run(Thread.java:748) /Users/cbishop/spark-3.1.1-amzn-0-bin-3.2.1-amzn-3/python/pyspark/shell.py:42: UserWarning: Failed to initialize Spark session. warnings.warn("Failed to initialize Spark session.") Traceback (most recent call last): File "/Users/cbishop/spark-3.1.1-amzn-0-bin-3.2.1-amzn-3/python/pyspark/shell.py", line 38, in <module> spark = SparkSession._create_shell_session() # type: ignore File "/Users/cbishop/spark-3.1.1-amzn-0-bin-3.2.1-amzn-3/python/pyspark/sql/session.py", line 553, in _create_shell_session return SparkSession.builder.getOrCreate() File "/Users/cbishop/spark-3.1.1-amzn-0-bin-3.2.1-amzn-3/python/pyspark/sql/session.py", line 228, in getOrCreate sc = SparkContext.getOrCreate(sparkConf) File "/Users/cbishop/spark-3.1.1-amzn-0-bin-3.2.1-amzn-3/python/pyspark/context.py", line 392, in getOrCreate SparkContext(conf=conf or SparkConf()) File "/Users/cbishop/spark-3.1.1-amzn-0-bin-3.2.1-amzn-3/python/pyspark/context.py", line 146, in __init__ self._do_init(master, appName, sparkHome, pyFiles, environment, batchSize, serializer, File "/Users/cbishop/spark-3.1.1-amzn-0-bin-3.2.1-amzn-3/python/pyspark/context.py", line 209, in _do_init self._jsc = jsc or self._initialize_context(self._conf._jconf) File "/Users/cbishop/spark-3.1.1-amzn-0-bin-3.2.1-amzn-3/python/pyspark/context.py", line 329, in _initialize_context return self._jvm.JavaSparkContext(jconf) File "/Users/cbishop/spark-3.1.1-amzn-0-bin-3.2.1-amzn-3/python/lib/py4j-0.10.9-src.zip/py4j/java_gateway.py", line 1568, in __call__ return_value = get_return_value( File "/Users/cbishop/spark-3.1.1-amzn-0-bin-3.2.1-amzn-3/python/lib/py4j-0.10.9-src.zip/py4j/protocol.py", line 326, in get_return_value raise Py4JJavaError( py4j.protocol.Py4JJavaError: An error occurred while calling None.org.apache.spark.api.java.JavaSparkContext. : java.lang.NoSuchMethodError: io.netty.util.concurrent.SingleThreadEventExecutor.<init>(Lio/netty/util/concurrent/EventExecutorGroup;Ljava/util/concurrent/Executor;ZLjava/util/Queue;Lio/netty/util/concurrent/RejectedExecutionHandler;)V at io.netty.channel.SingleThreadEventLoop.<init>(SingleThreadEventLoop.java:65) at io.netty.channel.nio.NioEventLoop.<init>(NioEventLoop.java:138) at io.netty.channel.nio.NioEventLoopGroup.newChild(NioEventLoopGroup.java:146) at io.netty.channel.nio.NioEventLoopGroup.newChild(NioEventLoopGroup.java:37) at io.netty.util.concurrent.MultithreadEventExecutorGroup.<init>(MultithreadEventExecutorGroup.java:84) at io.netty.util.concurrent.MultithreadEventExecutorGroup.<init>(MultithreadEventExecutorGroup.java:58) at io.netty.util.concurrent.MultithreadEventExecutorGroup.<init>(MultithreadEventExecutorGroup.java:47) at io.netty.channel.MultithreadEventLoopGroup.<init>(MultithreadEventLoopGroup.java:59) at io.netty.channel.nio.NioEventLoopGroup.<init>(NioEventLoopGroup.java:86) at io.netty.channel.nio.NioEventLoopGroup.<init>(NioEventLoopGroup.java:81) at io.netty.channel.nio.NioEventLoopGroup.<init>(NioEventLoopGroup.java:68) at org.apache.spark.network.util.NettyUtils.createEventLoop(NettyUtils.java:66) at org.apache.spark.network.client.TransportClientFactory.<init>(TransportClientFactory.java:106) at org.apache.spark.network.TransportContext.createClientFactory(TransportContext.java:142) at org.apache.spark.rpc.netty.NettyRpcEnv.<init>(NettyRpcEnv.scala:77) at org.apache.spark.rpc.netty.NettyRpcEnvFactory.create(NettyRpcEnv.scala:493) at org.apache.spark.rpc.RpcEnv$.create(RpcEnv.scala:57) at org.apache.spark.SparkEnv$.create(SparkEnv.scala:266) at org.apache.spark.SparkEnv$.createDriverEnv(SparkEnv.scala:189) at org.apache.spark.SparkContext.createSparkEnv(SparkContext.scala:277) at org.apache.spark.SparkContext.<init>(SparkContext.scala:458) at org.apache.spark.api.java.JavaSparkContext.<init>(JavaSparkContext.scala:58) at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) at java.lang.reflect.Constructor.newInstance(Constructor.java:423) at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:247) at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357) at py4j.Gateway.invoke(Gateway.java:238) at py4j.commands.ConstructorCommand.invokeConstructor(ConstructorCommand.java:80) at py4j.commands.ConstructorCommand.execute(ConstructorCommand.java:69) at py4j.GatewayConnection.run(GatewayConnection.java:238) at java.lang.Thread.run(Thread.java:748)
Could we reopen this ticket? This issue seems not resolved.
I'm still experiencing the same issue with JDK 1.8.0_292 as https://github.com/awslabs/aws-glue-libs/issues/94.
[INFO] BUILD SUCCESS
[INFO] ------------------------------------------------------------------------
[INFO] Total time: 2.386 s
[INFO] Finished at: 2021-12-17T02:32:44-08:00
[INFO] ------------------------------------------------------------------------
mkdir: /Users/skym/dev/workspaces/aws-glue-libs/conf: File exists
/Users/skym/dev/workspaces/volta-etl
Picked up JAVA_TOOL_OPTIONS: -Djavax.net.ssl.trustStoreType=KeychainStore
Python 3.7.12 (default, Dec 17 2021, 02:24:21)
[Clang 12.0.0 (clang-1200.0.32.29)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
Picked up JAVA_TOOL_OPTIONS: -Djavax.net.ssl.trustStoreType=KeychainStore
Picked up JAVA_TOOL_OPTIONS: -Djavax.net.ssl.trustStoreType=KeychainStore
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/Users/skym/dev/workspaces/aws-glue-libs/jarsv1/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/Users/skym/dev/workspaces/aws-glue-libs/jarsv1/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/Users/skym/dev/tools/spark-3.1.1-amzn-0-bin-3.2.1-amzn-3/jars/slf4j-log4j12-1.7.30.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
21/12/17 02:32:47 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
21/12/17 02:32:47 WARN SparkContext: Another SparkContext is being constructed (or threw an exception in its constructor). This may indicate an error, since only one SparkContext should be running in this JVM (see SPARK-2243). The other SparkContext was created at:
org.apache.spark.api.java.JavaSparkContext.<init>(JavaSparkContext.scala:58)
sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
java.lang.reflect.Constructor.newInstance(Constructor.java:423)
py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:247)
py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
py4j.Gateway.invoke(Gateway.java:238)
py4j.commands.ConstructorCommand.invokeConstructor(ConstructorCommand.java:80)
py4j.commands.ConstructorCommand.execute(ConstructorCommand.java:69)
py4j.GatewayConnection.run(GatewayConnection.java:238)
java.lang.Thread.run(Thread.java:748)
/Users/skym/dev/tools/spark-3.1.1-amzn-0-bin-3.2.1-amzn-3/python/pyspark/shell.py:42: UserWarning: Failed to initialize Spark session.
warnings.warn("Failed to initialize Spark session.")
Traceback (most recent call last):
File "/Users/skym/dev/tools/spark-3.1.1-amzn-0-bin-3.2.1-amzn-3/python/pyspark/shell.py", line 38, in <module>
spark = SparkSession._create_shell_session() # type: ignore
File "/Users/skym/dev/tools/spark-3.1.1-amzn-0-bin-3.2.1-amzn-3/python/pyspark/sql/session.py", line 553, in _create_shell_session
return SparkSession.builder.getOrCreate()
File "/Users/skym/dev/tools/spark-3.1.1-amzn-0-bin-3.2.1-amzn-3/python/pyspark/sql/session.py", line 228, in getOrCreate
sc = SparkContext.getOrCreate(sparkConf)
File "/Users/skym/dev/tools/spark-3.1.1-amzn-0-bin-3.2.1-amzn-3/python/pyspark/context.py", line 392, in getOrCreate
SparkContext(conf=conf or SparkConf())
File "/Users/skym/dev/tools/spark-3.1.1-amzn-0-bin-3.2.1-amzn-3/python/pyspark/context.py", line 147, in __init__
conf, jsc, profiler_cls)
File "/Users/skym/dev/tools/spark-3.1.1-amzn-0-bin-3.2.1-amzn-3/python/pyspark/context.py", line 209, in _do_init
self._jsc = jsc or self._initialize_context(self._conf._jconf)
File "/Users/skym/dev/tools/spark-3.1.1-amzn-0-bin-3.2.1-amzn-3/python/pyspark/context.py", line 329, in _initialize_context
return self._jvm.JavaSparkContext(jconf)
File "/Users/skym/dev/tools/spark-3.1.1-amzn-0-bin-3.2.1-amzn-3/python/lib/py4j-0.10.9-src.zip/py4j/java_gateway.py", line 1569, in __call__
answer, self._gateway_client, None, self._fqn)
File "/Users/skym/dev/tools/spark-3.1.1-amzn-0-bin-3.2.1-amzn-3/python/lib/py4j-0.10.9-src.zip/py4j/protocol.py", line 328, in get_return_value
format(target_id, ".", name), value)
py4j.protocol.Py4JJavaError: An error occurred while calling None.org.apache.spark.api.java.JavaSparkContext.
: java.lang.NoSuchMethodError: io.netty.util.concurrent.SingleThreadEventExecutor.<init>(Lio/netty/util/concurrent/EventExecutorGroup;Ljava/util/concurrent/Executor;ZLjava/util/Queue;Lio/netty/util/concurrent/RejectedExecutionHandler;)V
at io.netty.channel.SingleThreadEventLoop.<init>(SingleThreadEventLoop.java:65)
at io.netty.channel.nio.NioEventLoop.<init>(NioEventLoop.java:138)
at io.netty.channel.nio.NioEventLoopGroup.newChild(NioEventLoopGroup.java:146)
at io.netty.channel.nio.NioEventLoopGroup.newChild(NioEventLoopGroup.java:37)
at io.netty.util.concurrent.MultithreadEventExecutorGroup.<init>(MultithreadEventExecutorGroup.java:84)
at io.netty.util.concurrent.MultithreadEventExecutorGroup.<init>(MultithreadEventExecutorGroup.java:58)
at io.netty.util.concurrent.MultithreadEventExecutorGroup.<init>(MultithreadEventExecutorGroup.java:47)
at io.netty.channel.MultithreadEventLoopGroup.<init>(MultithreadEventLoopGroup.java:59)
at io.netty.channel.nio.NioEventLoopGroup.<init>(NioEventLoopGroup.java:86)
at io.netty.channel.nio.NioEventLoopGroup.<init>(NioEventLoopGroup.java:81)
at io.netty.channel.nio.NioEventLoopGroup.<init>(NioEventLoopGroup.java:68)
at org.apache.spark.network.util.NettyUtils.createEventLoop(NettyUtils.java:66)
at org.apache.spark.network.client.TransportClientFactory.<init>(TransportClientFactory.java:106)
at org.apache.spark.network.TransportContext.createClientFactory(TransportContext.java:142)
at org.apache.spark.rpc.netty.NettyRpcEnv.<init>(NettyRpcEnv.scala:77)
at org.apache.spark.rpc.netty.NettyRpcEnvFactory.create(NettyRpcEnv.scala:493)
at org.apache.spark.rpc.RpcEnv$.create(RpcEnv.scala:57)
at org.apache.spark.SparkEnv$.create(SparkEnv.scala:266)
at org.apache.spark.SparkEnv$.createDriverEnv(SparkEnv.scala:189)
at org.apache.spark.SparkContext.createSparkEnv(SparkContext.scala:277)
at org.apache.spark.SparkContext.<init>(SparkContext.scala:458)
at org.apache.spark.api.java.JavaSparkContext.<init>(JavaSparkContext.scala:58)
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:247)
at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
at py4j.Gateway.invoke(Gateway.java:238)
at py4j.commands.ConstructorCommand.invokeConstructor(ConstructorCommand.java:80)
at py4j.commands.ConstructorCommand.execute(ConstructorCommand.java:69)
at py4j.GatewayConnection.run(GatewayConnection.java:238)
at java.lang.Thread.run(Thread.java:748)
Reopening.
After some research and testing, it seems related to netty dependency version problem imported by com.amazonaws.AWSGlueETL
.
After adding below two lines in glue-setup.sh
, the problem was disappeared for me. Please make proper update on dependencies in com.amazonaws.AWSGlueETL
.
# Run mvn copy-dependencies target to get the Glue dependencies locally
mvn -f $ROOT_DIR/pom.xml -DoutputDirectory=$ROOT_DIR/jarsv1 dependency:copy-dependencies
rm $GLUE_JARS_DIR/javax.servlet-3.*
rm $GLUE_JARS_DIR/netty-*
For customers who are facing netty related errors, please try adding following setting in spark-defaults.conf to avoid netty dependency issue.
spark-defaults.conf
spark.driver.extraClassPath /path_to_spark/jars/*:/path-to-aws-glue-libs/jars/*
spark.executor.extraClassPath /path_to_spark/jars/*:/path-to-aws-glue-libs/jars/*
Hmm, I located spark-defaults.conf
under $SPARK_HOME/conf
, but it did not solve the problem. What do you try to achieve with the configuration? Loading libraries found under $SPARK_HOME/jars
first then ${aws-glue-libs project root}/jars
?
If that's what you wanted, the proper place is not in each individual's spark config file, but this repo. You also have a script to create spark-defaults.conf
in this repo. Check this PR https://github.com/awslabs/aws-glue-libs/pull/115 as an example, you don't need to approve it though.
@skycmoon, Thanks for the correction, you are right, the change needs to be located in glue-setup.sh. I confirmed that this PR solved the NoSuchMethod error in gluepyspark command.
@moomindani, Happy to contribute! Could you merge my PR then?
Now we have completed the review, and merged your pull-request. We really appreciate your contribution!
From README.md: https://aws-glue-etl-artifacts.s3.amazonaws.com/glue-3.0/spark-3.1.1-amzn-0-bin-3.2.1-amzn-3
Does not contain
pyspark
binary/executable