awslabs / aws-glue-libs

AWS Glue Libraries are additions and enhancements to Spark for ETL operations.
Other
635 stars 300 forks source link

Getting AWSConnectionUtils error while running create_dynamic_frame_from_catalog #143

Closed purnima1612 closed 2 years ago

purnima1612 commented 2 years ago

Hello All,

I am trying to run glue job locally after connecting it to AWS .


from pyspark import SparkContext
from awsglue.context import GlueContext

glueContext = GlueContext(SparkContext.getOrCreate()) 
#inputDF = glueContext.create_dynamic_frame_from_options(connection_type = "s3", connection_options = {"paths": ["s3://glue-test-sandbox/test.json"]}, format = "json")
#inputDF = glueContext.create_dynamic_frame_from_catalog(database="mdm_xxgmdmadm", table_name= "_edqdbdv_app_svc_gartner_com__xxgmdmadm_t10")
inputDF = glueContext.create_dynamic_frame_from_options(connection_type = "oracle", connection_options = connection_oracle11_options)

inputDF.toDF().show()

error I am getting is


An error was encountered:
An error occurred while calling o82.getDynamicFrame.
: java.lang.NoClassDefFoundError: Could not initialize class com.amazonaws.services.glue.util.AWSConnectionUtils$
    at com.amazonaws.services.glue.GlueUtility$.getS3Client(GlueUtility.scala:11)
    at com.amazonaws.services.glue.util.JDBCWrapper$.connectionProperties(JDBCUtils.scala:879)
    at com.amazonaws.services.glue.util.JDBCWrapper.connectionProperties$lzycompute(JDBCUtils.scala:671)
    at com.amazonaws.services.glue.util.JDBCWrapper.connectionProperties(JDBCUtils.scala:671)
    at com.amazonaws.services.glue.util.JDBCWrapper.tableDF(JDBCUtils.scala:797)
    at com.amazonaws.services.glue.util.NoCondition$.tableDF(JDBCUtils.scala:85)
    at com.amazonaws.services.glue.util.NoJDBCPartitioner$.tableDF(JDBCUtils.scala:124)
    at com.amazonaws.services.glue.JDBCDataSource.getDynamicFrame(DataSource.scala:863)
    at com.amazonaws.services.glue.DataSource$class.getDynamicFrame(DataSource.scala:97)
    at com.amazonaws.services.glue.SparkSQLDataSource.getDynamicFrame(DataSource.scala:683)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:498)
    at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
    at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
    at py4j.Gateway.invoke(Gateway.java:282)
    at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
    at py4j.commands.CallCommand.execute(CallCommand.java:79)
    at py4j.GatewayConnection.run(GatewayConnection.java:238)
    at java.lang.Thread.run(Thread.java:748)

Traceback (most recent call last):
  File "/home/aws-glue-libs/awsglue.zip/awsglue/context.py", line 204, in create_dynamic_frame_from_options
    return source.getFrame(**kwargs)
  File "/home/aws-glue-libs/awsglue.zip/awsglue/data_source.py", line 36, in getFrame
    jframe = self._jsource.getDynamicFrame()
  File "/home/spark-2.4.3-bin-spark-2.4.3-bin-hadoop2.8/python/lib/py4j-0.10.7-src.zip/py4j/java_gateway.py", line 1257, in __call__
    answer, self.gateway_client, self.target_id, self.name)
  File "/home/spark-2.4.3-bin-spark-2.4.3-bin-hadoop2.8/python/lib/pyspark.zip/pyspark/sql/utils.py", line 63, in deco
    return f(*a, **kw)
  File "/home/spark-2.4.3-bin-spark-2.4.3-bin-hadoop2.8/python/lib/py4j-0.10.7-src.zip/py4j/protocol.py", line 328, in get_return_value
    format(target_id, ".", name), value)
py4j.protocol.Py4JJavaError: An error occurred while calling o82.getDynamicFrame.
: java.lang.NoClassDefFoundError: Could not initialize class com.amazonaws.services.glue.util.AWSConnectionUtils$
    at com.amazonaws.services.glue.GlueUtility$.getS3Client(GlueUtility.scala:11)
    at com.amazonaws.services.glue.util.JDBCWrapper$.connectionProperties(JDBCUtils.scala:879)
    at com.amazonaws.services.glue.util.JDBCWrapper.connectionProperties$lzycompute(JDBCUtils.scala:671)
    at com.amazonaws.services.glue.util.JDBCWrapper.connectionProperties(JDBCUtils.scala:671)
    at com.amazonaws.services.glue.util.JDBCWrapper.tableDF(JDBCUtils.scala:797)
    at com.amazonaws.services.glue.util.NoCondition$.tableDF(JDBCUtils.scala:85)
    at com.amazonaws.services.glue.util.NoJDBCPartitioner$.tableDF(JDBCUtils.scala:124)
    at com.amazonaws.services.glue.JDBCDataSource.getDynamicFrame(DataSource.scala:863)
    at com.amazonaws.services.glue.DataSource$class.getDynamicFrame(DataSource.scala:97)
    at com.amazonaws.services.glue.SparkSQLDataSource.getDynamicFrame(DataSource.scala:683)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:498)
    at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
    at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
    at py4j.Gateway.invoke(Gateway.java:282)
    at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
    at py4j.commands.CallCommand.execute(CallCommand.java:79)
    at py4j.GatewayConnection.run(GatewayConnection.java:238)
    at java.lang.Thread.run(Thread.java:748)

but when running as following

jdbcDF = spark.read \
    .format("jdbc") \
    .option("url", "jdbc:oracle:thin:@//*****************:1521/*****************") \
    .option("dbtable", "***************") \
    .option("user", "xxgmdmadm") \
    .option("password", "***************") \
    .load()

its running fine

purnima1612 commented 2 years ago

can some one please help in this

moomindani commented 2 years ago

We apologize for delay.

If you are still facing this issue, could you please try using the Glue official Docker image to see if the issue happens there? https://aws.amazon.com/blogs/big-data/develop-and-test-aws-glue-version-3-0-jobs-locally-using-a-docker-container/

moomindani commented 2 years ago

Please also make sure that you have AWS SDK for Java v1 in your classpath if you want to use your local setup rather than the Docker container.

For now, I resolve this issue, but please feel free to reopen the issue if you still see the issue even after verifying the classpath.

moomindani commented 1 year ago

Could you please try using the Glue official Docker image to see if the issue happens there? https://aws.amazon.com/blogs/big-data/develop-and-test-aws-glue-version-3-0-jobs-locally-using-a-docker-container/