Open aseychell opened 1 year ago
On a similar note - when is it planned to upload 4.0.0 docker image to https://hub.docker.com/r/amazon/aws-glue-libs/tags ?
There is no fundamental change since year. Either the project is dead or nothing updates. Assume 4.0 improving is from by upgrading pyspark. I will give a try on change python 3.10 and pyspark 3.3 to see whether it still compatible.
Glue 4.0 libs are released here: https://github.com/awslabs/aws-glue-libs/releases/tag/v4.0
Docker image will be updated shortly.
@saviodsouza29 , was wondering when the Docker Image will be up for 4.0.
@saviodsouza29
After downloading the latest spark archive, I'm getting the following error which seems to be some incorrect packaged jar file versions in the spark distribution. I'm running my job using ./bin/gluesparksubmit
TLR Tool version 4.3 used for code generation does not match the current runtime version 4.8ANTLR Runtime version 4.7.2 used for parser compilation does not match the current runtime version 4.8Traceback (most recent call last):
File "/Users/aldrinseychell/dev/trees/aws-glue-libs/aws-glue-libs/basic_fundtransfers.py", line 79, in <module>
FundTransfersSource = glueContext.create_dynamic_frame.from_catalog(
File "/Users/aldrinseychell/dev/trees/aws-glue-libs/aws-glue-libs/awsglue/dynamicframe.py", line 629, in from_catalog
return self._glue_context.create_dynamic_frame_from_catalog(db, table_name, redshift_tmp_dir, transformation_ctx, push_down_predicate, additional_options, catalog_id, **kwargs)
File "/Users/aldrinseychell/dev/trees/aws-glue-libs/aws-glue-libs/awsglue/context.py", line 184, in create_dynamic_frame_from_catalog
source = DataSource(self._ssql_ctx.getCatalogSource(db, table_name, redshift_tmp_dir, transformation_ctx,
File "/Users/aldrinseychell/dev/trees/aws-glue-libs/spark-3.3.0-amzn-1-bin-3.3.3-amzn-0/python/lib/py4j-0.10.9.5-src.zip/py4j/java_gateway.py", line 1321, in __call__
File "/Users/aldrinseychell/dev/trees/aws-glue-libs/spark-3.3.0-amzn-1-bin-3.3.3-amzn-0/python/pyspark/sql/utils.py", line 190, in deco
return f(*a, **kw)
File "/Users/aldrinseychell/dev/trees/aws-glue-libs/spark-3.3.0-amzn-1-bin-3.3.3-amzn-0/python/lib/py4j-0.10.9.5-src.zip/py4j/protocol.py", line 326, in get_return_value
py4j.protocol.Py4JJavaError: An error occurred while calling o34.getCatalogSource.
: java.lang.NoSuchMethodError: 'void org.json4s.CustomSerializer.<init>(scala.Function1, scala.reflect.Manifest)'
at com.amazonaws.services.glue.util.StringToBoolean$.<init>(JsonOptions.scala:77)
at com.amazonaws.services.glue.util.StringToBoolean$.<clinit>(JsonOptions.scala)
at com.amazonaws.services.glue.util.JsonOptions$.apply(JsonOptions.scala:71)
at com.amazonaws.services.glue.GlueContext.getCatalogSource(GlueContext.scala:225)
at com.amazonaws.services.glue.GlueContext.getCatalogSource(GlueContext.scala:185)
at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:77)
at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.base/java.lang.reflect.Method.invoke(Method.java:568)
at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
at py4j.Gateway.invoke(Gateway.java:282)
at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
at py4j.commands.CallCommand.execute(CallCommand.java:79)
at py4j.ClientServerConnection.waitForCommands(ClientServerConnection.java:182)
at py4j.ClientServerConnection.run(ClientServerConnection.java:106)
at java.base/java.lang.Thread.run(Thread.java:833)
@aseychell : I got the same error
https://github.com/awslabs/aws-glue-libs/issues/166
@saviodsouza29
After downloading the latest spark archive, I'm getting the following error which seems to be some incorrect packaged jar file versions in the spark distribution. I'm running my job using
./bin/gluesparksubmit
TLR Tool version 4.3 used for code generation does not match the current runtime version 4.8ANTLR Runtime version 4.7.2 used for parser compilation does not match the current runtime version 4.8Traceback (most recent call last): File "/Users/aldrinseychell/dev/trees/aws-glue-libs/aws-glue-libs/basic_fundtransfers.py", line 79, in <module> FundTransfersSource = glueContext.create_dynamic_frame.from_catalog( File "/Users/aldrinseychell/dev/trees/aws-glue-libs/aws-glue-libs/awsglue/dynamicframe.py", line 629, in from_catalog return self._glue_context.create_dynamic_frame_from_catalog(db, table_name, redshift_tmp_dir, transformation_ctx, push_down_predicate, additional_options, catalog_id, **kwargs) File "/Users/aldrinseychell/dev/trees/aws-glue-libs/aws-glue-libs/awsglue/context.py", line 184, in create_dynamic_frame_from_catalog source = DataSource(self._ssql_ctx.getCatalogSource(db, table_name, redshift_tmp_dir, transformation_ctx, File "/Users/aldrinseychell/dev/trees/aws-glue-libs/spark-3.3.0-amzn-1-bin-3.3.3-amzn-0/python/lib/py4j-0.10.9.5-src.zip/py4j/java_gateway.py", line 1321, in __call__ File "/Users/aldrinseychell/dev/trees/aws-glue-libs/spark-3.3.0-amzn-1-bin-3.3.3-amzn-0/python/pyspark/sql/utils.py", line 190, in deco return f(*a, **kw) File "/Users/aldrinseychell/dev/trees/aws-glue-libs/spark-3.3.0-amzn-1-bin-3.3.3-amzn-0/python/lib/py4j-0.10.9.5-src.zip/py4j/protocol.py", line 326, in get_return_value py4j.protocol.Py4JJavaError: An error occurred while calling o34.getCatalogSource. : java.lang.NoSuchMethodError: 'void org.json4s.CustomSerializer.<init>(scala.Function1, scala.reflect.Manifest)' at com.amazonaws.services.glue.util.StringToBoolean$.<init>(JsonOptions.scala:77) at com.amazonaws.services.glue.util.StringToBoolean$.<clinit>(JsonOptions.scala) at com.amazonaws.services.glue.util.JsonOptions$.apply(JsonOptions.scala:71) at com.amazonaws.services.glue.GlueContext.getCatalogSource(GlueContext.scala:225) at com.amazonaws.services.glue.GlueContext.getCatalogSource(GlueContext.scala:185) at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:77) at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.base/java.lang.reflect.Method.invoke(Method.java:568) at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244) at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357) at py4j.Gateway.invoke(Gateway.java:282) at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132) at py4j.commands.CallCommand.execute(CallCommand.java:79) at py4j.ClientServerConnection.waitForCommands(ClientServerConnection.java:182) at py4j.ClientServerConnection.run(ClientServerConnection.java:106) at java.base/java.lang.Thread.run(Thread.java:833)
Me too. With the current instructions and scripts this repository does not work.
I don't have this issue after upgrade. Do you mind share the list of jars so we can help you with side to side compare?
Here is the list list_jar.txt
I have the same jars list. @jimmymaise I will give a try to load from catalogue later to see whether I can replicate the issue.
I'm experiencing what looks similar when trying to create a dynamic frame from a catalog.
>>> frame = glueContext.create_dynamic_frame.from_catalog(database="some_db", table_name="some_table")
ANTLR Tool version 4.3 used for code generation does not match the current runtime version 4.8
ANTLR Runtime version 4.7.2 used for parser compilation does not match the current runtime version 4.8
ANTLR Tool version 4.3 used for code generation does not match the current runtime version 4.8
ANTLR Runtime version 4.7.2 used for parser compilation does not match the current runtime version 4.8
java.lang.NoSuchMethodError: 'void org.json4s.CustomSerializer.<init>(scala.Function1, scala.reflect.Manifest)'
at com.amazonaws.services.glue.util.StringToBoolean$.<init>(JsonOptions.scala:124)
at com.amazonaws.services.glue.util.StringToBoolean$.<clinit>(JsonOptions.scala)
at com.amazonaws.services.glue.util.JsonOptions$.apply(JsonOptions.scala:108)
at com.amazonaws.services.glue.GlueContext.getCatalogSource(GlueContext.scala:238)
at com.amazonaws.services.glue.GlueContext.getCatalogSource(GlueContext.scala:198)
at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:77)
at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.base/java.lang.reflect.Method.invoke(Method.java:568)
at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
at py4j.Gateway.invoke(Gateway.java:282)
at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
at py4j.commands.CallCommand.execute(CallCommand.java:79)
at py4j.ClientServerConnection.waitForCommands(ClientServerConnection.java:182)
at py4j.ClientServerConnection.run(ClientServerConnection.java:106)
at java.base/java.lang.Thread.run(Thread.java:833)
java.lang.NoClassDefFoundError: Could not initialize class com.amazonaws.services.glue.util.StringToBoolean$
at com.amazonaws.services.glue.util.JsonOptions.liftedTree1$1(JsonOptions.scala:30)
at com.amazonaws.services.glue.util.JsonOptions.<init>(JsonOptions.scala:29)
at com.amazonaws.services.glue.util.JDBCConf.toJsonOptions(DataCatalogWrapper.scala:47)
at com.amazonaws.services.glue.GlueContext.getGlueNativeJDBCSource(GlueContext.scala:514)
at com.amazonaws.services.glue.GlueContext.getCatalogSource(GlueContext.scala:326)
at com.amazonaws.services.glue.GlueContext.getCatalogSource(GlueContext.scala:198)
at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:77)
at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.base/java.lang.reflect.Method.invoke(Method.java:568)
at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
at py4j.Gateway.invoke(Gateway.java:282)
at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
at py4j.commands.CallCommand.execute(CallCommand.java:79)
at py4j.ClientServerConnection.waitForCommands(ClientServerConnection.java:182)
at py4j.ClientServerConnection.run(ClientServerConnection.java:106)
at java.base/java.lang.Thread.run(Thread.java:833)
The java.lang.NoClassDefFoundError: Could not initialize class com.amazonaws.services.glue.util.StringToBoolean$
error stack trace is then repeated 5 times.
@singlewind : Any updates? May I ask your running information such as JVM version, OS, Python version, etc ?
@singlewind : Any updates? May I ask your running information such as JVM version, OS, Python version, etc ?
Rush to present Glue 4.0 at AWS ReInvent, but then no support for the developers.
@singlewind can you please share Java, Python versions you tried? having similar issue with Python 3.10.8 and Corretto 20 (java). Used below aws-glue-lib repo. https://github.com/awslabs/aws-glue-libs.git -b master I appreciate your swift reply.
It is not affecting me anymore after you updated Docker images. Thanks
@singlewind can you please share Java, Python versions you tried? having similar issue with Python 3.10.8 and Corretto 20 (java). Used below aws-glue-lib repo. https://github.com/awslabs/aws-glue-libs.git -b master I appreciate your swift reply.
Hope this is not too late. Here is my local change upgraded from v3
Java: amazon-corretto-8-aarch64-macos-jdk Python: 3.10.2 Spark: https://aws-glue-etl-artifacts.s3.amazonaws.com/glue-4.0/spark-3.3.0-amzn-1-bin-3.3.3-amzn-0.tgz Glue-lib: https://github.com/awslabs/aws-glue-libs.git -b master
Following the release of AWS Glue v4, when is it planned to update the aws-glue-libs to support the new version as well?