awslabs / aws-glue-libs

AWS Glue Libraries are additions and enhancements to Spark for ETL operations.
Other
635 stars 300 forks source link

Unable to begin transactions in local dev environment #135

Open djcurill opened 2 years ago

djcurill commented 2 years ago

Version

amazon/aws-glue-libs:glue_libs_3.0.0_image_01

Environment Setup

I am using a local dev environment within the aws_glue_libs container. I get my development environment running using the following command:

docker run -it -v ~/.aws:/home/glue_user/.aws -v \ $JUPYTER_WORKSPACE_LOCATION:/home/glue_user/workspace/jupyter_workspace/ \
-e AWS_PROFILE=$PROFILE_NAME -e DISABLE_SSL=true \ 
--rm -p 4040:4040 -p 18080:18080 -p 8998:8998 -p 8888:8888 \
--name glue_jupyter_lab \
amazon/aws-glue-libs:glue_libs_3.0.0_image_01 /home/glue_user/jupyter/jupyter_start.sh

Problem Description

I am trying to develop a notebook that is capable of reading a data source from a governed table. When trying to read from a governed table using the glueContext.create_dynamic_frame.from_catalog method, it requires one of two parameters:

  1. asOfTime – (TimeStamp: yyyy-[m]m-[d]d hh:mm:ss) The time as of when to read the table contents. Cannot be specified along with transactionId.
  2. transactionId - he transaction ID at which to read the Governed table contents.

When trying to create a transaction id calling:

tx_id = glueContext.start_transaction(read_only=False)

I receive the following traceback:

An error was encountered:
'GlueContext' object has no attribute 'start_transaction'
Traceback (most recent call last):
AttributeError: 'GlueContext' object has no attribute 'start_transaction'

Alternative, there is a method called begin_transaction, however that raises a similar error stating:

An error was encountered:
An error occurred while calling o58.beginTransaction. Trace:
py4j.Py4JException: Method beginTransaction([class java.lang.Boolean]) does not exist
    at py4j.reflection.ReflectionEngine.getMethod(ReflectionEngine.java:318)
    at py4j.reflection.ReflectionEngine.getMethod(ReflectionEngine.java:326)
    at py4j.Gateway.invoke(Gateway.java:274)
    at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
    at py4j.commands.CallCommand.execute(CallCommand.java:79)
    at py4j.GatewayConnection.run(GatewayConnection.java:238)
    at java.lang.Thread.run(Thread.java:750)

Traceback (most recent call last):
  File "/home/glue_user/aws-glue-libs/PyGlue.zip/awsglue/context.py", line 564, in begin_transaction
    return self._ssql_ctx.beginTransaction(read_only)
  File "/home/glue_user/spark/python/lib/py4j-0.10.9-src.zip/py4j/java_gateway.py", line 1305, in __call__
    answer, self.gateway_client, self.target_id, self.name)
  File "/home/glue_user/spark/python/pyspark/sql/utils.py", line 111, in deco
    return f(*a, **kw)
  File "/home/glue_user/spark/python/lib/py4j-0.10.9-src.zip/py4j/protocol.py", line 332, in get_return_value
    format(target_id, ".", name, value))
py4j.protocol.Py4JError: An error occurred while calling o58.beginTransaction. Trace:
py4j.Py4JException: Method beginTransaction([class java.lang.Boolean]) does not exist
    at py4j.reflection.ReflectionEngine.getMethod(ReflectionEngine.java:318)
    at py4j.reflection.ReflectionEngine.getMethod(ReflectionEngine.java:326)
    at py4j.Gateway.invoke(Gateway.java:274)
    at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
    at py4j.commands.CallCommand.execute(CallCommand.java:79)
    at py4j.GatewayConnection.run(GatewayConnection.java:238)
    at java.lang.Thread.run(Thread.java:750)

Final Comments & Questions

There is no method called start_transaction in context.py but there is a begin_transaction method. The scala context does not have the corresponding beginContext method. Could this be a bug? Or are users not allowed initiate transactions from their local environments and must use the Glue Console workspace instead?

pitergarcia commented 2 years ago

begin_transaction

I have the same issue, it would be nice to know how to fix this or get around it. Thanks!