aws-samples / aws-glue-samples

AWS Glue code samples
MIT No Attribution
1.44k stars 821 forks source link

local execution of aws glue #42

Closed ghost closed 4 years ago

ghost commented 5 years ago

Trying to run aws glue with AWSGlue.zip throws following error


~/opt/spark-2.2.0-bin-hadoop2.7/bin/pyspark
Python 2.7.15 |Anaconda, Inc.| (default, Dec 14 2018, 13:10:39) 
[GCC 4.2.1 Compatible Clang 4.0.1 (tags/RELEASE_401/final)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
19/02/18 18:36:18 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
19/02/18 18:36:23 WARN ObjectStore: Failed to get database global_temp, returning NoSuchObjectException
Welcome to
      ____              __
     / __/__  ___ _____/ /__
    _\ \/ _ \/ _ `/ __/  '_/
   /__ / .__/\_,_/_/ /_/\_\   version 2.2.0
      /_/

Using Python version 2.7.15 (default, Dec 14 2018 13:10:39)
SparkSession available as 'spark'.
>>> from awsglue.dynamicframe import DynamicFrameWriter
>>> glueContext = GlueContext(sc)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
NameError: name 'GlueContext' is not defined
>>> from awsglue.context import GlueContext
>>> glueContext = GlueContext(sc)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "awsglue/context.py", line 44, in __init__
    self._glue_scala_context = self._get_glue_scala_context(**options)
  File "awsglue/context.py", line 64, in _get_glue_scala_context
    return self._jvm.GlueContext(self._jsc.sc())
TypeError: 'JavaPackage' object is not callable
sherkartejas commented 5 years ago

Hi @BardiaAfshin I'm facing the same issue. Did you manage to resolve it? Thanks.

Cheers x

ghost commented 5 years ago

No, I have a support ticket from AWS, they basically came back with "this is proprietary issue", but I think that cannot be right answer. You can get their scala package working locally.

rashid-1989 commented 5 years ago

I am also facing the same issue. Could Aws team provide any solution @BardiaAfshin ?

rvasconcelossilva commented 4 years ago

Any answers about this issue?

rpshgupta commented 4 years ago

Any solution to this?? I am getting same issue.. i am trying to run aws glue on my ubuntu system..

ypeError Traceback (most recent call last)

in ----> 1 glueContext = GlueContext(sc) ~/aws-glue-libs-glue-1.0/PyGlue.zip/awsglue/context.py in __init__(self, sparkContext, **options) 43 super(GlueContext, self).__init__(sparkContext) 44 register(sparkContext) ---> 45 self._glue_scala_context = self._get_glue_scala_context(**options) 46 self.create_dynamic_frame = DynamicFrameReader(self) 47 self.write_dynamic_frame = DynamicFrameWriter(self) ~/aws-glue-libs-glue-1.0/PyGlue.zip/awsglue/context.py in _get_glue_scala_context(self, **options) 64 65 if min_partitions is None: ---> 66 return self._jvm.GlueContext(self._jsc.sc()) 67 else: 68 return self._jvm.GlueContext(self._jsc.sc(), min_partitions, target_partitions) TypeError: 'JavaPackage' object is not callable
moomindani commented 4 years ago

Recently we have updated the maven repository to solve issues for local development. Can you try again and check if you still see the issue?

moomindani commented 4 years ago

Today I tried it on the latest Amazon Linux 2 and on my Macbook, and it worked without any errors.

If you still the error, can you describe details?

kashifmalik-sde commented 4 years ago

Thank you @moomindani - In my case, I had the jarsv1 not landing in the correct folder, but once that was fixed, I was able to run the job locally with no issue.

mohdaliiqbal commented 4 years ago

Thank you @moomindani - In my case, I had the jarsv1 not landing in the correct folder, but once that was fixed, I was able to run the job locally with no issue.

that is correct if glue jars were not available in the classpath you will see this error.

ghost commented 4 years ago

I followed instructions from aws docs - I see that in awslabs there is aws glue repo that has better instructions. BUT - for those wondering what jarsv1 is - like myself coming from aws docs, I followed these instructions and got it to work locally.

https://support.wharton.upenn.edu/help/glue-debugging#update-path-and-java-home-

moomindani commented 4 years ago

It seems that you are not seeing the issue now. Please reopen this issue if you still see the issue.