Open cakkinep opened 3 years ago
Could you post the steps you built the jar? Let me try to reproduce this issue.
I have the same issue, followed the exact steps outlined here: https://docs.aws.amazon.com/glue/latest/dg/aws-glue-programming-etl-libraries.html
Which version of JAVA are you using? @Hari-Nagarajan @cakkinep
Which version of JAVA are you using? @Hari-Nagarajan @cakkinep @SimCo92
java -version openjdk version "1.8.0_282" OpenJDK Runtime Environment (build 1.8.0_282-b08) OpenJDK 64-Bit Server VM (build 25.282-b08, mixed mode)
Same Java version same issue, even it is only raised when running jobs with an AWS user that is not admin.
I had to tweak the pom.xml to get rid of this exception. See the pom.xml I used in attachment, to sum up the changes I made:
Same Java version same issue, even it is only raised when running jobs with an AWS user that is not admin.
I had to tweak the pom.xml to get rid of this exception. See the pom.xml I used in attachment, to sum up the changes I made:
- exclude io.netty.netty-common, io.netty.netty-buffer in AWSGlueETL
- force com.amazonaws.aws-java-sdk-glue and com.amazonaws.aws-java-sdk-lakeformation to 1.11.774 version
@CedricPerotto this would only work if you already have 1.11.774 in your local maven from the past builds. A clean mvn build is not pulling down this version of aws-java-sdk-glue jar.
Hi same issue as reported by @Hari-Nagarajan
Follow instructions for glue 1.0 here:
https://github.com/awslabs/aws-glue-libs
java --version openjdk 11.0.10 2021-01-19 OpenJDK Runtime Environment (build 11.0.10+9-Ubuntu-0ubuntu1.20.04) OpenJDK 64-Bit Server VM (build 11.0.10+9-Ubuntu-0ubuntu1.20.04, mixed mode, sharing)
python --version Python 3.7.10
py4j.protocol.Py4JJavaError: An error occurred while calling o27.getCatalogSource. : java.lang.NoSuchFieldError: ENDPOINT_OVERRIDDEN at com.amazonaws.services.glue.AWSGlueClient.executeGetTable(AWSGlueClient.java:6180) at com.amazonaws.services.glue.AWSGlueClient.getTable(AWSGlueClient.java:6162) at com.amazonaws.services.glue.util.DataCatalogWrapper$$anonfun$4.apply(DataCatalogWrapper.scala:139) at com.amazonaws.services.glue.util.DataCatalogWrapper$$anonfun$4.apply(DataCatalogWrapper.scala:135) at scala.util.Try$.apply(Try.scala:191) at com.amazonaws.services.glue.util.DataCatalogWrapper.getTable(DataCatalogWrapper.scala:135) at com.amazonaws.services.glue.GlueContext.getCatalogSource(GlueContext.scala:192) at com.amazonaws.services.glue.GlueContext.getCatalogSource(GlueContext.scala:181) at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.base/java.lang.reflect.Method.invoke(Method.java:566) at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244) at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357) at py4j.Gateway.invoke(Gateway.java:282) at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132) at py4j.commands.CallCommand.execute(CallCommand.java:79) at py4j.GatewayConnection.run(GatewayConnection.java:238) at java.base/java.lang.Thread.run(Thread.java:834)
Hi same issue as reported by @Hari-Nagarajan
Follow instructions for glue 1.0 here:
https://github.com/awslabs/aws-glue-libs
java --version openjdk 11.0.10 2021-01-19 OpenJDK Runtime Environment (build 11.0.10+9-Ubuntu-0ubuntu1.20.04) OpenJDK 64-Bit Server VM (build 11.0.10+9-Ubuntu-0ubuntu1.20.04, mixed mode, sharing)
python --version Python 3.7.10
py4j.protocol.Py4JJavaError: An error occurred while calling o27.getCatalogSource. : java.lang.NoSuchFieldError: ENDPOINT_OVERRIDDEN at com.amazonaws.services.glue.AWSGlueClient.executeGetTable(AWSGlueClient.java:6180) at com.amazonaws.services.glue.AWSGlueClient.getTable(AWSGlueClient.java:6162) at com.amazonaws.services.glue.util.DataCatalogWrapper$$anonfun$4.apply(DataCatalogWrapper.scala:139) at com.amazonaws.services.glue.util.DataCatalogWrapper$$anonfun$4.apply(DataCatalogWrapper.scala:135) at scala.util.Try$.apply(Try.scala:191) at com.amazonaws.services.glue.util.DataCatalogWrapper.getTable(DataCatalogWrapper.scala:135) at com.amazonaws.services.glue.GlueContext.getCatalogSource(GlueContext.scala:192) at com.amazonaws.services.glue.GlueContext.getCatalogSource(GlueContext.scala:181) at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.base/java.lang.reflect.Method.invoke(Method.java:566) at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244) at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357) at py4j.Gateway.invoke(Gateway.java:282) at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132) at py4j.commands.CallCommand.execute(CallCommand.java:79) at py4j.GatewayConnection.run(GatewayConnection.java:238) at java.base/java.lang.Thread.run(Thread.java:834)
fwiw, was able to get past this and run jobs by:
downgrading java to 1.8
sudo apt-get install openjdk-8-jdk sudo update-alternatives --config java
And applying the changes in @CedricPerotto pom.xml
i have the same issue, is it fixed already in the latest code I also followed instructions for glue 1.0 here: https://github.com/awslabs/aws-glue-libs
java version "1.8.0_261" Java(TM) SE Runtime Environment (build 1.8.0_261-b12) Java HotSpot(TM) 64-Bit Server VM (build 25.261-b12, mixed mode)
Python 3.7.9
macOs Big Sur Version 11.4
21/09/09 14:34:07 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
21/09/09 14:34:08 INFO SparkContext: Running Spark version 2.4.3
21/09/09 14:34:08 INFO SparkContext: Submitted application: test.py
21/09/09 14:34:08 INFO SecurityManager: Changing view acls to: xxxxxxx
21/09/09 14:34:08 INFO SecurityManager: Changing modify acls to: xxxxxxx
21/09/09 14:34:08 INFO SecurityManager: Changing view acls groups to:
21/09/09 14:34:08 INFO SecurityManager: Changing modify acls groups to:
21/09/09 14:34:08 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(xxxxxxx); groups with view permissions: Set(); users with modify permissions: Set(xxxxxxx); groups with modify permissions: Set()
21/09/09 14:34:13 INFO Utils: Successfully started service 'sparkDriver' on port 54245.
21/09/09 14:34:13 INFO SparkEnv: Registering MapOutputTracker
21/09/09 14:34:13 INFO SparkEnv: Registering BlockManagerMaster
21/09/09 14:34:13 INFO BlockManagerMasterEndpoint: Using org.apache.spark.storage.DefaultTopologyMapper for getting topology information
21/09/09 14:34:13 INFO BlockManagerMasterEndpoint: BlockManagerMasterEndpoint up
21/09/09 14:34:13 INFO DiskBlockManager: Created local directory at /private/var/folders/ps/ryytn7jd5wxdg6tjlnk2wp3m0000gn/T/blockmgr-7506616e-07a6-4d8d-969e-bc9a7f2cac83
21/09/09 14:34:13 INFO MemoryStore: MemoryStore started with capacity 366.3 MB
21/09/09 14:34:13 INFO SparkEnv: Registering OutputCommitCoordinator
21/09/09 14:34:13 WARN Utils: Service 'SparkUI' could not bind on port 4040. Attempting port 4041.
21/09/09 14:34:13 WARN Utils: Service 'SparkUI' could not bind on port 4041. Attempting port 4042.
21/09/09 14:34:13 INFO Utils: Successfully started service 'SparkUI' on port 4042.
21/09/09 14:34:13 INFO SparkUI: Bound SparkUI to 0.0.0.0, and started at http://192.168.3.18:4042
21/09/09 14:34:14 INFO SparkContext: Added file file:///Users/xxxxxxx/Documents/GitHub/aws-glue-libs/PyGlue.zip at file:///Users/xxxxxxx/Documents/GitHub/aws-glue-libs/PyGlue.zip with timestamp 1631169254045
21/09/09 14:34:14 INFO Utils: Copying /Users/xxxxxxx/Documents/GitHub/aws-glue-libs/PyGlue.zip to /private/var/folders/ps/ryytn7jd5wxdg6tjlnk2wp3m0000gn/T/spark-fc8212b2-0361-4597-920a-f1b324843cf6/userFiles-464039b1-876e-4382-83e1-169d61e160a7/PyGlue.zip
21/09/09 14:34:14 INFO Executor: Starting executor ID driver on host localhost
21/09/09 14:34:14 INFO Utils: Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 54246.
21/09/09 14:34:14 INFO NettyBlockTransferService: Server created on 192.168.3.18:54246
21/09/09 14:34:14 INFO BlockManager: Using org.apache.spark.storage.RandomBlockReplicationPolicy for block replication policy
21/09/09 14:34:14 INFO BlockManagerMaster: Registering BlockManager BlockManagerId(driver, 192.168.3.18, 54246, None)
21/09/09 14:34:14 INFO BlockManagerMasterEndpoint: Registering block manager 192.168.3.18:54246 with 366.3 MB RAM, BlockManagerId(driver, 192.168.3.18, 54246, None)
21/09/09 14:34:14 INFO BlockManagerMaster: Registered BlockManager BlockManagerId(driver, 192.168.3.18, 54246, None)
21/09/09 14:34:14 INFO BlockManager: Initialized BlockManager: BlockManagerId(driver, 192.168.3.18, 54246, None)
21/09/09 14:34:14 INFO GlueContext: GlueMetrics not configured
21/09/09 14:34:14 INFO GlueContext: fs.s3.impl hadoop configuration is not set. Setting fs.s3.impl to org.apache.hadoop.fs.s3a.S3AFileSystem
21/09/09 14:34:14 WARN BasicProfileConfigLoader: Your profile name includes a 'profile ' prefix. This is considered part of the profile name in the Java SDK, so you will need to include this prefix in your profile name when you reference this profile from your Java code.
21/09/09 14:34:15 WARN BasicProfileConfigLoader: Your profile name includes a 'profile ' prefix. This is considered part of the profile name in the Java SDK, so you will need to include this prefix in your profile name when you reference this profile from your Java code.
Traceback (most recent call last):
File "/Users/xiaoningren/test.py", line 23, in
21/09/09 14:34:15 INFO SparkContext: Invoking stop() from shutdown hook
I have built aws-glue-libs using the glue-1.0 branch. Injected aws credentials into ~/.aws/credentials and launched spark-sql to test query AWS Glue data catalog.
I have last built the jars in December and they are still working, but the latest build is throwing following error. Do we know if there are any bugs injected into aws-sdk which throws the following error?
spark-sql 21/03/09 21:04:49 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable log4j:WARN No appenders could be found for logger (org.apache.hadoop.conf.Configuration). log4j:WARN Please initialize the log4j system properly. log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info. Exception in thread "main" java.lang.NoSuchFieldError: ENDPOINT_OVERRIDDEN at com.amazonaws.services.glue.AWSGlueClient.executeGetDatabase(AWSGlueClient.java:4389) at com.amazonaws.services.glue.AWSGlueClient.getDatabase(AWSGlueClient.java:4371) at com.amazonaws.glue.catalog.metastore.AWSCatalogMetastoreClient.doesDefaultDBExist(AWSCatalogMetastoreClient.java:234) at com.amazonaws.glue.catalog.metastore.AWSCatalogMetastoreClient.(AWSCatalogMetastoreClient.java:154)
at com.amazonaws.glue.catalog.metastore.AWSGlueDataCatalogHiveClientFactory.createMetaStoreClient(AWSGlueDataCatalogHiveClientFactory.java:16)
at org.apache.hadoop.hive.ql.metadata.Hive.createMetaStoreClient(Hive.java:3007)
at org.apache.hadoop.hive.ql.metadata.Hive.getMSC(Hive.java:3042)
at org.apache.hadoop.hive.ql.metadata.Hive.getAllDatabases(Hive.java:1235)
at org.apache.hadoop.hive.ql.metadata.Hive.reloadFunctions(Hive.java:175)
at org.apache.hadoop.hive.ql.metadata.Hive.(Hive.java:167)
at org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:503)
at org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver$.main(SparkSQLCLIDriver.scala:133)
at org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.main(SparkSQLCLIDriver.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)
at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:849)
at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:167)
at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:195)
at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:86)
at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:924)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:933)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)