awslabs / aws-glue-libs

AWS Glue Libraries are additions and enhancements to Spark for ETL operations.
Other
635 stars 299 forks source link

Error while running ETL script #33

Open likhitha-surapaneni opened 4 years ago

likhitha-surapaneni commented 4 years ago

I am getting the following error when I try to run the ETL script inspite of assigning the required libraries, PyGlue.zip to my PYTHONPATH. Could you tell me how to resolve this?

Traceback (most recent call last): File "aws-glue-clone-db-to-s3.py", line 30, in gc = GlueContext(sc) File "/home/jupyter/notebooks/etl/libs/bmrn-glue-libs/build/libs/PyGlue.zip/awsglue/context.py", line 44, in init File "/home/jupyter/notebooks/etl/libs/bmrn-glue-libs/build/libs/PyGlue.zip/awsglue/context.py", line 64, in _get_glue_scala_context TypeError: 'JavaPackage' object is not callable

mariafung88 commented 4 years ago

I have the same error, Do you have s solution for this? This my sample code import os import sys print(sys.path) sys.path.append("/home/ec2-user/aws-glue-libs-glue-1.0") print(sys.path) from pyspark.context import SparkContext from awsglue.context import GlueContext from awsglue.transforms import * from awsglue.utils import getResolvedOptions

print(os.path) print(sys.version) print(os.environ['JAVA_HOME']) print(os.environ)

Create a Glue context

glueContext = GlueContext(SparkContext.getOrCreate())

Create a DynamicFrame using the 'persons_json' table

persons_DyF = glueContext.create_dynamic_frame.from_catalog(database="legislators", table_name="persons_json")

Print out information about this data

print ("Count: ", persons_DyF.count()) persons_DyF.printSchema() Fail to execute line 16: glueContext = GlueContext(SparkContext.getOrCreate()) Traceback (most recent call last): File "/tmp/zeppelin_pyspark-1456254874984644993.py", line 375, in exec(code, _zcUserQueryNameSpace) File "", line 16, in File "/home/ec2-user/aws-glue-libs-glue-1.0/awsglue/context.py", line 45, in init self._glue_scala_context = self._get_glue_scala_context(**options) File "/home/ec2-user/aws-glue-libs-glue-1.0/awsglue/context.py", line 66, in _get_glue_scala_context return self._jvm.GlueContext(self._jsc.sc()) TypeError: 'JavaPackage' object is not callable

svajiraya commented 4 years ago

@likhitha96 @mariafung88

Looks like an issue with the CLASSPATH. Can you double check the CLASSPATH and make sure Hadoop, Spark and Glue jars are in there?

I tested the above code with my docker image (https://hub.docker.com/repository/docker/svajiraya/glue-dev-1.0) and it seems to be working just fine.

  adding: awsglue/ (stored 0%)
  adding: awsglue/README.md (deflated 57%)
  adding: awsglue/__init__.py (deflated 37%)
  adding: awsglue/context.py (deflated 78%)
  adding: awsglue/data_sink.py (deflated 60%)
  adding: awsglue/data_source.py (deflated 58%)
  adding: awsglue/devutils.py (deflated 76%)
  adding: awsglue/dynamicframe.py (deflated 81%)
  adding: awsglue/functions.py (deflated 53%)
  adding: awsglue/gluetypes.py (deflated 77%)
  adding: awsglue/job.py (deflated 58%)
  adding: awsglue/transforms/ (stored 0%)
  adding: awsglue/transforms/__init__.py (deflated 58%)
  adding: awsglue/transforms/apply_mapping.py (deflated 68%)
  adding: awsglue/transforms/coalesce.py (deflated 67%)
  adding: awsglue/transforms/collection_transforms.py (deflated 79%)
  adding: awsglue/transforms/drop_nulls.py (deflated 66%)
  adding: awsglue/transforms/dynamicframe_filter.py (deflated 67%)
  adding: awsglue/transforms/dynamicframe_map.py (deflated 68%)
  adding: awsglue/transforms/errors_as_dynamicframe.py (deflated 57%)
  adding: awsglue/transforms/field_transforms.py (deflated 88%)
  adding: awsglue/transforms/relationalize.py (deflated 69%)
  adding: awsglue/transforms/repartition.py (deflated 67%)
  adding: awsglue/transforms/resolve_choice.py (deflated 72%)
  adding: awsglue/transforms/transform.py (deflated 69%)
  adding: awsglue/transforms/unbox.py (deflated 74%)
  adding: awsglue/transforms/unnest_frame.py (deflated 69%)
  adding: awsglue/utils.py (deflated 70%)
/glue
Python 3.5.3 (default, Sep 27 2018, 17:25:39)
[GCC 6.3.0 20170516] on linux
Type "help", "copyright", "credits" or "license" for more information.
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/glue/jarsv1/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/root/spark-2.4.3-bin-spark-2.4.3-bin-hadoop2.8/jars/slf4j-log4j12-1.7.16.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
20/01/31 03:02:11 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
Welcome to
      ____              __
     / __/__  ___ _____/ /__
    _\ \/ _ \/ _ `/ __/  '_/
   /__ / .__/\_,_/_/ /_/\_\   version 2.4.3
      /_/

Using Python version 3.5.3 (default, Sep 27 2018 17:25:39)
SparkSession available as 'spark'.
>>>
>>> import os
>>> import sys
>>> print(sys.path)
['', '/tmp/spark-761aa392-c67d-41d4-bd83-c1b3daddf3c5/userFiles-6d8d2415-6a2c-480d-bda8-8297a464f9e4', '/root/spark-2.4.3-bin-spark-2.4.3-bin-hadoop2.8/python/lib/py4j-0.10.7-src.zip', '/root/spark-2.4.3-bin-spark-2.4.3-bin-hadoop2.8/python', '/glue/PyGlue.zip', '/glue', '/usr/lib/python35.zip', '/usr/lib/python3.5', '/usr/lib/python3.5/plat-x86_64-linux-gnu', '/usr/lib/python3.5/lib-dynload', '/usr/local/lib/python3.5/dist-packages', '/usr/lib/python3/dist-packages']
>>> sys.path.append("/home/ec2-user/aws-glue-libs-glue-1.0")
>>> print(sys.path)
['', '/tmp/spark-761aa392-c67d-41d4-bd83-c1b3daddf3c5/userFiles-6d8d2415-6a2c-480d-bda8-8297a464f9e4', '/root/spark-2.4.3-bin-spark-2.4.3-bin-hadoop2.8/python/lib/py4j-0.10.7-src.zip', '/root/spark-2.4.3-bin-spark-2.4.3-bin-hadoop2.8/python', '/glue/PyGlue.zip', '/glue', '/usr/lib/python35.zip', '/usr/lib/python3.5', '/usr/lib/python3.5/plat-x86_64-linux-gnu', '/usr/lib/python3.5/lib-dynload', '/usr/local/lib/python3.5/dist-packages', '/usr/lib/python3/dist-packages', '/home/ec2-user/aws-glue-libs-glue-1.0']
>>> from pyspark.context import SparkContext
>>> from awsglue.context import GlueContext
>>> from awsglue.transforms import *
>>> from awsglue.utils import getResolvedOptions
>>>
>>> print(os.path)
<module 'posixpath' from '/usr/lib/python3.5/posixpath.py'>
>>> print(sys.version)
3.5.3 (default, Sep 27 2018, 17:25:39)
[GCC 6.3.0 20170516]
>>> print(os.environ['JAVA_HOME'])
/usr/local/openjdk-8
>>> print(os.environ)
environ({'PWD': '/glue', 'SPARK_HOME': '/root/spark-2.4.3-bin-spark-2.4.3-bin-hadoop2.8', 'JAVA_URL_VERSION': '8u232b09', 'PYTHONHASHSEED': '0', 'TERM': 'xterm', 'JAVA_HOME': '/usr/local/openjdk-8', 'HOSTNAME': '9f8ae0ec835b', 'OLDPWD': '/glue', 'SPARK_CONF_DIR': '/glue/conf', 'MAVEN_HOME': '/root/apache-maven-3.6.0', 'SPARK_ENV_LOADED': '1', 'LANG': 'C.UTF-8', 'PYSPARK_PYTHON': 'python3', 'PYTHONSTARTUP': '/root/spark-2.4.3-bin-spark-2.4.3-bin-hadoop2.8/python/pyspark/shell.py', '_SPARK_CMD_USAGE': 'Usage: ./bin/pyspark [options]', 'JAVA_VERSION': '8u232', 'PYTHONPATH': '/root/spark-2.4.3-bin-spark-2.4.3-bin-hadoop2.8/python/lib/py4j-0.10.7-src.zip:/root/spark-2.4.3-bin-spark-2.4.3-bin-hadoop2.8/python/:/glue/PyGlue.zip:/root/spark-2.4.3-bin-spark-2.4.3-bin-hadoop2.8/python/lib/py4j-0.10.7-src.zip:/root/spark-2.4.3-bin-spark-2.4.3-bin-hadoop2.8/python/:', 'OLD_PYTHONSTARTUP': '', 'PYSPARK_SUBMIT_ARGS': '"--name" "PySparkShell" "pyspark-shell"', 'PATH': '/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/root/apache-maven-3.6.0/bin:/root/spark-2.4.3-bin-spark-2.4.3-bin-hadoop2.8/bin:/glue/bin', 'PYSPARK_DRIVER_PYTHON': 'python3', 'HOME': '/root', 'JAVA_BASE_URL': 'https://github.com/AdoptOpenJDK/openjdk8-upstream-binaries/releases/download/jdk8u232-b09/OpenJDK8U-jdk_', 'SPARK_SCALA_VERSION': '2.12', 'SHLVL': '1'})
mariafung88 commented 4 years ago

Thanks for the suggestion. Which CLASSPATH should I look at? I am new to this and sorry for asking a simple question like this. I have checked the Hadoop, Spark and Glue jars are in my PATH and show in the os.environ I tried both executing the code in the pyspark shell and also in zeppelin but still getting the same error Any other suggestions? Thanks This is what I have in my .bash_profile

PATH=$PATH:$HOME/.local/bin:$HOME/bin export SPARK_HOME=$HOME/spark PATH=$PATH:$SPARK_HOME/bin export ZEPPELIN_HOME=$HOME/zeppelin PATH=$PATH:$ZEPPELIN_HOME/bin export MAVEN=$HOME/maven PATH=$PATH:$MAVEN/bin export PATH

export JAVA_HOME=/usr/lib/jvm/jre-1.8.0-openjdk.x86_64/ export SPARK_CONF_DIR=$HOME/aws-glue-libs/conf export PYTHONPATH="${SPARK_HOME}python/:${SPARK_HOME}python/lib/py4j-0.10.7-src.zip:$HOME/aws-glue-libs/PyGlue.zip:${PYTHONPATH}"

environ({'PATH': '/usr/local/bin:/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/sbin:/opt/aws/bin:/home/ec2-user/.local/bin:/home/ec2-user/bin:/home/ec2-user/spark/bin:/home/ec2-user/zeppelin/bin:/home/ec2-user/maven/bin:/home/ec2-user/.local/bin:/home/ec2-user/bin:/home/ec2-user/spark/bin:/home/ec2-user/zeppelin/bin:/home/ec2-user/maven/bin:/home/ec2-user/.local/bin:/home/ec2-user/bin:/home/ec2-user/spark/bin:/home/ec2-user/zeppelin/bin:/home/ec2-user/maven/bin:/home/ec2-user/.local/bin:/home/ec2-user/bin:/home/ec2-user/spark/bin:/home/ec2-user/zeppelin/bin:/home/ec2-user/maven/bin', 'ZEPPELIN_LOG_DIR': '/home/ec2-user/zeppelin/logs', 'HISTCONTROL': 'ignoredups', 'ZEPPELIN_WAR': '/home/ec2-user/zeppelin/zeppelin-web-0.8.1.war', 'ZEPPELIN_ENCODING': 'UTF-8', 'ZEPPELIN_SPARK_CONF': " --master local[] --conf spark.app.name='Zeppelin'", 'LESS_TERMCAP_se': '\x1b[0m', 'ZEPPELIN_NICENESS': '0', 'SPARK_ENV_LOADED': '1', 'JAVA_OPTS': ' -Dfile.encoding=UTF-8 -Xms1024m -Xmx1024m -XX:MaxPermSize=512m -Dlog4j.configuration=file:///home/ec2-user/zeppelin/conf/log4j.properties -Dzeppelin.log.file=/home/ec2-user/zeppelin/logs/zeppelin-ec2-user-ip-172-31-18-0.log -Dfile.encoding=UTF-8 -Xms1024m -Xmx1024m -XX:MaxPermSize=512m -Dlog4j.configuration=file:///home/ec2-user/zeppelin/conf/log4j.properties', 'JAVA_INTP_OPTS': ' -Dfile.encoding=UTF-8 -Dlog4j.configuration=file:///home/ec2-user/zeppelin/conf/log4j.properties -Dzeppelin.log.file=/home/ec2-user/zeppelin/logs/zeppelin-interpreter-spark-ec2-user-ip-172-31-18-0.log', 'MAIL': '/var/spool/mail/ec2-user', 'ZEPPELIN_CONF_DIR': '/home/ec2-user/zeppelin/conf', 'LOGNAME': 'ec2-user', 'PWD': '/home/ec2-user', 'PYTHONPATH': '/home/ec2-user/spark/python/lib/pyspark.zip:/home/ec2-user/spark/python/lib/py4j-0.10.7-src.zip:/home/ec2-user/zeppelin/interpreter/lib/python:/home/ec2-user/spark/python/lib/py4j-0.10.7-src.zip:/home/ec2-user/spark/python/:/home/ec2-user/sparkpython/:/home/ec2-user/sparkpython/lib/py4j-0.10.7-src.zip:/home/ec2-user/aws-glue-libs/PyGlue.zip::file:/home/ec2-user/zeppelin/interpreter/spark/spark-interpreter-0.8.1.jar:/home/ec2-user/zeppelin-0.8.1-bin-all/interpreter/spark/spark-interpreter-0.8.1.jar', 'LESSOPEN': '||/usr/bin/lesspipe.sh %s', 'SPARK_SUBMIT': '/home/ec2-user/spark/bin/spark-submit', 'SHELL': '/bin/bash', 'ZEPPELIN_INTP_MEM': '-Xms1024m -Xmx1024m -XX:MaxPermSize=512m', 'AWS_PATH': '/opt/aws', 'SPARK_CONF_DIR': '/home/ec2-user/aws-glue-libs/conf', 'MAVEN': '/home/ec2-user/maven', 'EC2_AMITOOL_HOME': '/opt/aws/amitools/ec2', 'LS_COLORS': 'rs=0:di=01;34:ln=01;36:mh=00:pi=40;33:so=01;35:do=01;35:bd=40;33;01:cd=40;33;01:or=40;31;01:mi=01;05;37;41:su=37;41:sg=30;43:ca=30;41:tw=30;42:ow=34;42:st=37;44:ex=01;32:.tar=01;31:.tgz=01;31:.arc=01;31:.arj=01;31:.taz=01;31:.lha=01;31:.lz4=01;31:.lzh=01;31:.lzma=01;31:.tlz=01;31:.txz=01;31:.tzo=01;31:.t7z=01;31:.zip=01;31:.z=01;31:.Z=01;31:.dz=01;31:.gz=01;31:.lrz=01;31:.lz=01;31:.lzo=01;31:.xz=01;31:.bz2=01;31:.bz=01;31:.tbz=01;31:.tbz2=01;31:.tz=01;31:.deb=01;31:.rpm=01;31:.jar=01;31:.war=01;31:.ear=01;31:.sar=01;31:.rar=01;31:.alz=01;31:.ace=01;31:.zoo=01;31:.cpio=01;31:.7z=01;31:.rz=01;31:.cab=01;31:.jpg=01;35:.jpeg=01;35:.gif=01;35:.bmp=01;35:.pbm=01;35:.pgm=01;35:.ppm=01;35:.tga=01;35:.xbm=01;35:.xpm=01;35:.tif=01;35:.tiff=01;35:.png=01;35:.svg=01;35:.svgz=01;35:.mng=01;35:.pcx=01;35:.mov=01;35:.mpg=01;35:.mpeg=01;35:.m2v=01;35:.mkv=01;35:.webm=01;35:.ogm=01;35:.mp4=01;35:.m4v=01;35:.mp4v=01;35:.vob=01;35:.qt=01;35:.nuv=01;35:.wmv=01;35:.asf=01;35:.rm=01;35:.rmvb=01;35:.flc=01;35:.avi=01;35:.fli=01;35:.flv=01;35:.gl=01;35:.dl=01;35:.xcf=01;35:.xwd=01;35:.yuv=01;35:.cgm=01;35:.emf=01;35:.axv=01;35:.anx=01;35:.ogv=01;35:.ogx=01;35:.aac=01;36:.au=01;36:.flac=01;36:.mid=01;36:.midi=01;36:.mka=01;36:.mp3=01;36:.mpc=01;36:.ogg=01;36:.ra=01;36:.wav=01;36:.axa=01;36:.oga=01;36:.spx=01;36:.xspf=01;36:', 'SHLVL': '3', 'LESS_TERMCAP_md': '\x1b[01;38;5;208m', 'LESS_TERMCAP_me': '\x1b[0m', 'LESS_TERMCAP_mb': '\x1b[01;31m', 'AWS_AUTO_SCALING_HOME': '/opt/aws/apitools/as', 'HISTSIZE': '1000', 'JAVA_HOME': '/usr/lib/jvm/jre-1.8.0-openjdk.x86_64/', 'AWS_ELB_HOME': '/opt/aws/apitools/elb', 'LESS_TERMCAP_us': '\x1b[04;38;5;111m', 'EC2_HOME': '/opt/aws/apitools/ec2', 'TERM': 'xterm', 'LANG': 'en_US.UTF-8', 'AWS_CLOUDWATCH_HOME': '/opt/aws/apitools/mon', 'SPARK_SCALA_VERSION': '2.12', 'SPARK_HOME': '/home/ec2-user/spark', 'ZEPPELIN_RUNNER': '/usr/lib/jvm/jre-1.8.0-openjdk.x86_64//bin/java', 'LESS_TERMCAP_ue': '\x1b[0m', 'PYTHONHASHSEED': '0', 'ZEPPELIN_HOME': '/home/ec2-user/zeppelin', 'SSH_TTY': '/dev/pts/0', 'SSH_CLIENT': '137.145.235.50 55261 22', 'USER': 'ec2-user', 'ZEPPELIN_PID_DIR': '/home/ec2-user/zeppelin/run', 'ZEPPELIN_MEM': '-Xms1024m -Xmx1024m -XX:MaxPermSize=512m', 'SSH_CONNECTION': '137.145.235.50 55261 172.31.18.0 22', 'HOSTNAME': 'ip-172-31-18-0', 'ZEPPELIN_IDENT_STRING': 'ec2-user', 'ZEPPELIN_INTERPRETER_REMOTE_RUNNER': 'bin/interpreter.sh', 'HOME': '/home/ec2-user'})

Sent from Outlookhttp://aka.ms/weboutlook


From: Subramanya Vajiraya notifications@github.com Sent: Thursday, January 30, 2020 7:23 PM To: awslabs/aws-glue-libs aws-glue-libs@noreply.github.com Cc: mariafung88 mariafung@hotmail.com; Comment comment@noreply.github.com Subject: Re: [awslabs/aws-glue-libs] Error while running ETL script (#33)

Looks like an issue with the CLASSPATH. Can you double check the CLASSPATH and make sure Hadoop, Spark and Glue jars are in there?

I tested the above code with my docker image (https://hub.docker.com/repository/docker/svajiraya/glue-dev-1.0) and it seems to be working just fine.

adding: awsglue/ (stored 0%) adding: awsglue/README.md (deflated 57%) adding: awsglue/init.py (deflated 37%) adding: awsglue/context.py (deflated 78%) adding: awsglue/data_sink.py (deflated 60%) adding: awsglue/data_source.py (deflated 58%) adding: awsglue/devutils.py (deflated 76%) adding: awsglue/dynamicframe.py (deflated 81%) adding: awsglue/functions.py (deflated 53%) adding: awsglue/gluetypes.py (deflated 77%) adding: awsglue/job.py (deflated 58%) adding: awsglue/transforms/ (stored 0%) adding: awsglue/transforms/init.py (deflated 58%) adding: awsglue/transforms/apply_mapping.py (deflated 68%) adding: awsglue/transforms/coalesce.py (deflated 67%) adding: awsglue/transforms/collection_transforms.py (deflated 79%) adding: awsglue/transforms/drop_nulls.py (deflated 66%) adding: awsglue/transforms/dynamicframe_filter.py (deflated 67%) adding: awsglue/transforms/dynamicframe_map.py (deflated 68%) adding: awsglue/transforms/errors_as_dynamicframe.py (deflated 57%) adding: awsglue/transforms/field_transforms.py (deflated 88%) adding: awsglue/transforms/relationalize.py (deflated 69%) adding: awsglue/transforms/repartition.py (deflated 67%) adding: awsglue/transforms/resolve_choice.py (deflated 72%) adding: awsglue/transforms/transform.py (deflated 69%) adding: awsglue/transforms/unbox.py (deflated 74%) adding: awsglue/transforms/unnest_frame.py (deflated 69%) adding: awsglue/utils.py (deflated 70%) /glue Python 3.5.3 (default, Sep 27 2018, 17:25:39) [GCC 6.3.0 20170516] on linux Type "help", "copyright", "credits" or "license" for more information. SLF4J: Class path contains multiple SLF4J bindings. SLF4J: Found binding in [jar:file:/glue/jarsv1/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: Found binding in [jar:file:/root/spark-2.4.3-bin-spark-2.4.3-bin-hadoop2.8/jars/slf4j-log4j12-1.7.16.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation. SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory] 20/01/31 03:02:11 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable Setting default log level to "WARN". To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel). Welcome to


 / __/__  ___ _____/ /__
_\ \/ _ \/ _ `/ __/  '_/

/ / ./_,// //_\ version 2.4.3 //

Using Python version 3.5.3 (default, Sep 27 2018 17:25:39) SparkSession available as 'spark'.

import os import sys print(sys.path) ['', '/tmp/spark-761aa392-c67d-41d4-bd83-c1b3daddf3c5/userFiles-6d8d2415-6a2c-480d-bda8-8297a464f9e4', '/root/spark-2.4.3-bin-spark-2.4.3-bin-hadoop2.8/python/lib/py4j-0.10.7-src.zip', '/root/spark-2.4.3-bin-spark-2.4.3-bin-hadoop2.8/python', '/glue/PyGlue.zip', '/glue', '/usr/lib/python35.zip', '/usr/lib/python3.5', '/usr/lib/python3.5/plat-x86_64-linux-gnu', '/usr/lib/python3.5/lib-dynload', '/usr/local/lib/python3.5/dist-packages', '/usr/lib/python3/dist-packages'] sys.path.append("/home/ec2-user/aws-glue-libs-glue-1.0") print(sys.path) ['', '/tmp/spark-761aa392-c67d-41d4-bd83-c1b3daddf3c5/userFiles-6d8d2415-6a2c-480d-bda8-8297a464f9e4', '/root/spark-2.4.3-bin-spark-2.4.3-bin-hadoop2.8/python/lib/py4j-0.10.7-src.zip', '/root/spark-2.4.3-bin-spark-2.4.3-bin-hadoop2.8/python', '/glue/PyGlue.zip', '/glue', '/usr/lib/python35.zip', '/usr/lib/python3.5', '/usr/lib/python3.5/plat-x86_64-linux-gnu', '/usr/lib/python3.5/lib-dynload', '/usr/local/lib/python3.5/dist-packages', '/usr/lib/python3/dist-packages', '/home/ec2-user/aws-glue-libs-glue-1.0'] from pyspark.context import SparkContext from awsglue.context import GlueContext from awsglue.transforms import * from awsglue.utils import getResolvedOptions

print(os.path) <module 'posixpath' from '/usr/lib/python3.5/posixpath.py'> print(sys.version) 3.5.3 (default, Sep 27 2018, 17:25:39) [GCC 6.3.0 20170516] print(os.environ['JAVA_HOME']) /usr/local/openjdk-8 print(os.environ) environ({'PWD': '/glue', 'SPARK_HOME': '/root/spark-2.4.3-bin-spark-2.4.3-bin-hadoop2.8', 'JAVA_URL_VERSION': '8u232b09', 'PYTHONHASHSEED': '0', 'TERM': 'xterm', 'JAVA_HOME': '/usr/local/openjdk-8', 'HOSTNAME': '9f8ae0ec835b', 'OLDPWD': '/glue', 'SPARK_CONF_DIR': '/glue/conf', 'MAVEN_HOME': '/root/apache-maven-3.6.0', 'SPARK_ENV_LOADED': '1', 'LANG': 'C.UTF-8', 'PYSPARK_PYTHON': 'python3', 'PYTHONSTARTUP': '/root/spark-2.4.3-bin-spark-2.4.3-bin-hadoop2.8/python/pyspark/shell.py', '_SPARK_CMD_USAGE': 'Usage: ./bin/pyspark [options]', 'JAVA_VERSION': '8u232', 'PYTHONPATH': '/root/spark-2.4.3-bin-spark-2.4.3-bin-hadoop2.8/python/lib/py4j-0.10.7-src.zip:/root/spark-2.4.3-bin-spark-2.4.3-bin-hadoop2.8/python/:/glue/PyGlue.zip:/root/spark-2.4.3-bin-spark-2.4.3-bin-hadoop2.8/python/lib/py4j-0.10.7-src.zip:/root/spark-2.4.3-bin-spark-2.4.3-bin-hadoop2.8/python/:', 'OLD_PYTHONSTARTUP': '', 'PYSPARK_SUBMIT_ARGS': '"--name" "PySparkShell" "pyspark-shell"', 'PATH': '/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/root/apache-maven-3.6.0/bin:/root/spark-2.4.3-bin-spark-2.4.3-bin-hadoop2.8/bin:/glue/bin', 'PYSPARK_DRIVER_PYTHON': 'python3', 'HOME': '/root', 'JAVA_BASEURL': 'https://github.com/AdoptOpenJDK/openjdk8-upstream-binaries/releases/download/jdk8u232-b09/OpenJDK8U-jdk', 'SPARK_SCALA_VERSION': '2.12', 'SHLVL': '1'})

— You are receiving this because you commented. Reply to this email directly, view it on GitHubhttps://github.com/awslabs/aws-glue-libs/issues/33?email_source=notifications&email_token=AI5ZJ734G2XHEURPJRQKMXLRAOKRZA5CNFSM4JEB6NTKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEKNLNJY#issuecomment-580564647, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AI5ZJ772YRZWW42IFAT6ORLRAOKRZANCNFSM4JEB6NTA.

rvasconcelossilva commented 4 years ago

Any answer?

mariafung88 commented 4 years ago

I finally got it resolved by doing the following and renaming some jars file sudo yum install java-1.8.0-openjdk-devel.x86_64 export JAVA_HOME=/usr/lib/jvm/jre-1.8.0-openjdk.x86_64/ tar zxvf apache-maven-3.6.0-bin.tar.gz ln -fs apache-maven-3.6.0 /home/ec2-user/maven export MAVEN=$HOME/maven PATH=$PATH:$MAVEN/bin tar zxvf spark-2.4.3-bin-hadoop2.8.tgz -C /home/ec2-user ln -fs spark-2.4.3-bin-spark-2.4.3-bin-hadoop2.8 /home/ec2-user/spark unzip aws-glue-libs-glue-1.0.zip cd aws-glue-libs-glue-1.0 chmod +x ./bin/glue-setup.sh ./glue-setup.sh modify pom.xml

mvn -f $ROOT_DIR/pom.xml -DoutputDirectory=$ROOT_DIR/jars dependency:copy-dependencie

do not remove aws-glue-libs/jarsv1/netty-all-4.0.23.Final.jar rename the netty-all-4.1.17.Final.jar to something else cp $HOME/spark-2.4.3-bin-spark-2.4.3-bin-hadoop2.8/jars/netty-all-4.1.17.Final.jar $HOME/aws-glue-libs-glue-1.0/jarsv1/spark_netty-all-4.1.17.Final.jar

rvasconcelossilva commented 4 years ago

Hey @mariafung88 thanks for that. Only to clarify the last step, should I rename the netty-all-4..1.17.Final.jar where? In spark folder before copying it or after copying it?

rvasconcelossilva commented 4 years ago

Only to give you more backgroud about I am doing. I'm setting up AWS GLUE locally in my laptop using pipenv. My .env file is as below:

HADOOP_HOME="C:\Users[user]\AppData\Local\Spark\winutils" SPARK_HOME="C:\Users[user]\AppData\Local\Spark\spark-2.4.3-bin-hadoop2.8\spark-2.4.3-bin-hadoop2.8\spark-2.4.3-bin-spark-2.4.3-bin-hadoop2.8" JAVA_HOME="C:\Program Files\Java\jdk1.8.0_231" PATH="${HADOOP_HOME}\bin" PATH="${SPARK_HOME}\bin:${PATH}" PATH="${JAVA_HOME}\bin:${PATH}" SPARK_CONF_DIR="C:\Users[user]\Documents\folder\projects\code\aws-glue-libs-glue-1.0\conf" PYTHONPATH="${SPARK_HOME}python/:${PYTHONPATH}" PYTHONPATH="${SPARK_HOME}python/lib/py4j-0.10.7-src.zip:${PYTHONPATH}" PYTHONPATH="C:/Users/rvasconc/Documents/folder/projects/code/aws-glue-libs-glue-1.0/PyGlue.zip:${PYTHONPATH}"

When I run my code I've got the following error:

20/04/05 17:09:13 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties Setting default log level to "WARN". To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel). Traceback (most recent call last): File "C:\Users[user]\Documents\network10\projects\code\data-lake\etl\tealium\visitor.py", line 12, in glueContext = GlueContext(sc.getOrCreate()) File "C:\Users[user]\Documents\folder\projects\code\aws-glue-libs-glue-1.0\PyGlue.zip\awsglue\context.py", line 45, in init File "C:\Users[user]\Documents\folder\projects\code\aws-glue-libs-glue-1.0\PyGlue.zip\awsglue\context.py", line 66, in _get_glue_scala_context TypeError: 'JavaPackage' object is not callable PS C:\Users\rvasconc\Documents\network10\projects\code\data-lake\etl\tealium> SUCCESS: The process with PID 23208 (child process of PID 2652) has been terminated. SUCCESS: The process with PID 2652 (child process of PID 2796) has been terminated. SUCCESS: The process with PID 2796 (child process of PID 9696) has been terminated.

mariafung88 commented 4 years ago

I have renamed the netty-all-4..1.17.Final.jar in the jarsv1 folder underneath the aws-glue-lib. Keep the original in the spark home So in jarsv1 I have these two netty_all* netty-all-4.0.23.Final.jar sparknetty-all-4..1.17.Final.jar (rename to spark)

From: rvasconcelossilva notifications@github.com Reply-To: awslabs/aws-glue-libs reply@reply.github.com Date: Saturday, April 4, 2020 at 11:07 PM To: awslabs/aws-glue-libs aws-glue-libs@noreply.github.com Cc: mariafung88 mariafung@hotmail.com, Mention mention@noreply.github.com Subject: Re: [awslabs/aws-glue-libs] Error while running ETL script (#33)

Hey @mariafung88https://github.com/mariafung88 thanks for that. Only to clarify the last step, should I rename the netty-all-4..1.17.Final.jar where? In spark folder before copying it or after copying it?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/awslabs/aws-glue-libs/issues/33#issuecomment-609364642, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AI5ZJ77KIXUZS4HYCPZWI7LRLANZNANCNFSM4JEB6NTA.

rvasconcelossilva commented 4 years ago

Thanks for that. It's not working yet, anyway thanks for your help.

rvasconcelossilva commented 4 years ago

@mariafung88 only one more questions, you mentioned that you renamed some jars? Is that one only the netty or have you renamed others? Sorry for bothering you, that issue is blowing my mind.

mariafung88 commented 4 years ago

Netty-all-4.1.17.Final.jar is the only one I have renamed

From: rvasconcelossilva notifications@github.com Reply-To: awslabs/aws-glue-libs reply@reply.github.com Date: Monday, April 6, 2020 at 10:36 PM To: awslabs/aws-glue-libs aws-glue-libs@noreply.github.com Cc: mariafung88 mariafung@hotmail.com, Mention mention@noreply.github.com Subject: Re: [awslabs/aws-glue-libs] Error while running ETL script (#33)

@mariafung88https://github.com/mariafung88 only one more questions, you mentioned that you renamed some jars? Is that one only the netty or have you renamed others? Sorry for bothering you, that issue is blowing my mind.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/awslabs/aws-glue-libs/issues/33#issuecomment-610184383, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AI5ZJ77YVAHBMDEBOQV5VELRLK3XNANCNFSM4JEB6NTA.

rvasconcelossilva commented 4 years ago

Thank you for that information. Unfortunately it is not working for me, I’ll use another solution for while. Thanks.

On Wed, 8 Apr 2020 at 04:50, mariafung88 notifications@github.com wrote:

Netty-all-4.1.17.Final.jar is the only one I have renamed

From: rvasconcelossilva notifications@github.com Reply-To: awslabs/aws-glue-libs reply@reply.github.com Date: Monday, April 6, 2020 at 10:36 PM To: awslabs/aws-glue-libs aws-glue-libs@noreply.github.com Cc: mariafung88 mariafung@hotmail.com, Mention < mention@noreply.github.com> Subject: Re: [awslabs/aws-glue-libs] Error while running ETL script (#33)

@mariafung88https://github.com/mariafung88 only one more questions, you mentioned that you renamed some jars? Is that one only the netty or have you renamed others? Sorry for bothering you, that issue is blowing my mind.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub< https://github.com/awslabs/aws-glue-libs/issues/33#issuecomment-610184383>, or unsubscribe< https://github.com/notifications/unsubscribe-auth/AI5ZJ77YVAHBMDEBOQV5VELRLK3XNANCNFSM4JEB6NTA>.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/awslabs/aws-glue-libs/issues/33#issuecomment-610559372, or unsubscribe https://github.com/notifications/unsubscribe-auth/AICTHWC6AAJ2XJOGN7L6UK3RLNYXHANCNFSM4JEB6NTA .

-- Abraço,

Rafael Vasconcelos Silva

esobolievv commented 4 years ago

Hi, have the same issues with:

TypeError: 'JavaPackage' object is not callable

I event deleted all netty jars from my aws-glue-libs/jarsv1, but error still exists. I don't know how to fix this. Also while running my app I defined 3 env vars like:

export SPARK_HOME=$HOME/spark-2.4.3-bin-spark-2.4.3-bin-hadoop2.8.3
export SPARK_CONF=$HOME/IdeaProjects/aws-glue-libs/conf
export JAVA_HOME=/Library/Java/JavaVirtualMachines/jdk1.8.0_181.jdk/Contents/Home

Can anybody help me on it?

sound118 commented 2 years ago

I finally got it resolved by doing the following and renaming some jars file sudo yum install java-1.8.0-openjdk-devel.x86_64 export JAVA_HOME=/usr/lib/jvm/jre-1.8.0-openjdk.x86_64/ tar zxvf apache-maven-3.6.0-bin.tar.gz ln -fs apache-maven-3.6.0 /home/ec2-user/maven export MAVEN=$HOME/maven PATH=$PATH:$MAVEN/bin tar zxvf spark-2.4.3-bin-hadoop2.8.tgz -C /home/ec2-user ln -fs spark-2.4.3-bin-spark-2.4.3-bin-hadoop2.8 /home/ec2-user/spark unzip aws-glue-libs-glue-1.0.zip cd aws-glue-libs-glue-1.0 chmod +x ./bin/glue-setup.sh ./glue-setup.sh modify pom.xml

mvn -f $ROOT_DIR/pom.xml -DoutputDirectory=$ROOT_DIR/jars dependency:copy-dependencie

do not remove aws-glue-libs/jarsv1/netty-all-4.0.23.Final.jar rename the netty-all-4.1.17.Final.jar to something else cp $HOME/spark-2.4.3-bin-spark-2.4.3-bin-hadoop2.8/jars/netty-all-4.1.17.Final.jar $HOME/aws-glue-libs-glue-1.0/jarsv1/spark_netty-all-4.1.17.Final.jar

Where did you get aws-glue-libs-glue-1.0.zip from?

mariafung88 commented 2 years ago

1.o can be found here now. https://github.com/awslabs/aws-glue-libs/releases [https://opengraph.githubassets.com/4e21b6d2f99b91191e447553158dd1889d73ec6066caa0d27dbca41ae267e676/awslabs/aws-glue-libs]https://github.com/awslabs/aws-glue-libs/releases Releases · awslabs/aws-glue-libs · GitHubhttps://github.com/awslabs/aws-glue-libs/releases AWS Glue Libraries are additions and enhancements to Spark for ETL operations. - Releases · awslabs/aws-glue-libs github.com

Sent from Outlookhttp://aka.ms/weboutlook


From: sound118 @.> Sent: Wednesday, November 10, 2021 2:38 AM To: awslabs/aws-glue-libs @.> Cc: mariafung88 @.>; Mention @.> Subject: Re: [awslabs/aws-glue-libs] Error while running ETL script (#33)

I finally got it resolved by doing the following and renaming some jars file sudo yum install java-1.8.0-openjdk-devel.x86_64 export JAVA_HOME=/usr/lib/jvm/jre-1.8.0-openjdk.x86_64/ tar zxvf apache-maven-3.6.0-bin.tar.gz ln -fs apache-maven-3.6.0 /home/ec2-user/maven export MAVEN=$HOME/maven PATH=$PATH:$MAVEN/bin tar zxvf spark-2.4.3-bin-hadoop2.8.tgz -C /home/ec2-user ln -fs spark-2.4.3-bin-spark-2.4.3-bin-hadoop2.8 /home/ec2-user/spark unzip aws-glue-libs-glue-1.0.zip cd aws-glue-libs-glue-1.0 chmod +x ./bin/glue-setup.sh ./glue-setup.sh modify pom.xml

mvn -f $ROOT_DIR/pom.xml -DoutputDirectory=$ROOT_DIR/jars dependency:copy-dependencie

do not remove aws-glue-libs/jarsv1/netty-all-4.0.23.Final.jar rename the netty-all-4.1.17.Final.jar to something else cp $HOME/spark-2.4.3-bin-spark-2.4.3-bin-hadoop2.8/jars/netty-all-4.1.17.Final.jar $HOME/aws-glue-libs-glue-1.0/jarsv1/spark_netty-all-4.1.17.Final.jar

Where did you get aws-glue-libs-glue-1.0.zip from?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/awslabs/aws-glue-libs/issues/33#issuecomment-965002600, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AI5ZJ7YM6KVUCJBDCME5PBDULJDRNANCNFSM4JEB6NTA. Triage notifications on the go with GitHub Mobile for iOShttps://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Androidhttps://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

sound118 commented 2 years ago

1.o can be found here now. https://github.com/awslabs/aws-glue-libs/releases [https://opengraph.githubassets.com/4e21b6d2f99b91191e447553158dd1889d73ec6066caa0d27dbca41ae267e676/awslabs/aws-glue-libs]<https://github.com/awslabs/aws-glue-libs/releases> Releases · awslabs/aws-glue-libs · GitHubhttps://github.com/awslabs/aws-glue-libs/releases AWS Glue Libraries are additions and enhancements to Spark for ETL operations. - Releases · awslabs/aws-glue-libs github.com Sent from Outlookhttp://aka.ms/weboutlook ____ From: sound118 @.> Sent: Wednesday, November 10, 2021 2:38 AM To: awslabs/aws-glue-libs @.> Cc: mariafung88 @.>; Mention @.> Subject: Re: [awslabs/aws-glue-libs] Error while running ETL script (#33) I finally got it resolved by doing the following and renaming some jars file sudo yum install java-1.8.0-openjdk-devel.x86_64 export JAVA_HOME=/usr/lib/jvm/jre-1.8.0-openjdk.x86_64/ tar zxvf apache-maven-3.6.0-bin.tar.gz ln -fs apache-maven-3.6.0 /home/ec2-user/maven export MAVEN=$HOME/maven PATH=$PATH:$MAVEN/bin tar zxvf spark-2.4.3-bin-hadoop2.8.tgz -C /home/ec2-user ln -fs spark-2.4.3-bin-spark-2.4.3-bin-hadoop2.8 /home/ec2-user/spark unzip aws-glue-libs-glue-1.0.zip cd aws-glue-libs-glue-1.0 chmod +x ./bin/glue-setup.sh ./glue-setup.sh modify pom.xml mvn -f $ROOT_DIR/pom.xml -DoutputDirectory=$ROOT_DIR/jars dependency:copy-dependencie do not remove aws-glue-libs/jarsv1/netty-all-4.0.23.Final.jar rename the netty-all-4.1.17.Final.jar to something else cp $HOME/spark-2.4.3-bin-spark-2.4.3-bin-hadoop2.8/jars/netty-all-4.1.17.Final.jar $HOME/aws-glue-libs-glue-1.0/jarsv1/spark_netty-all-4.1.17.Final.jar Where did you get aws-glue-libs-glue-1.0.zip from? — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub<#33 (comment)>, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AI5ZJ7YM6KVUCJBDCME5PBDULJDRNANCNFSM4JEB6NTA. Triage notifications on the go with GitHub Mobile for iOShttps://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Androidhttps://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

Thanks for the response. You mentioned modify the pom.xml file. How would you modify it? Is it something like what it is said on this site https://support.wharton.upenn.edu/help/glue-debugging#run-glue-setup-sh