starting org.apache.spark.deploy.history.HistoryServer, logging to /home/glue_user/spark/logs/spark-glue_user-org.apache.spark.deploy.history.HistoryServer-x-xxxxxxxx.out
starting java -cp /home/glue_user/livy/jars/*:/home/glue_user/livy/conf:/home/glue_user/spark/conf:/home/glue_user/spark/conf: org.apache.livy.server.LivyServer, logging to /home/glue_user/livy/logs/livy-glue_user-server.out
SSL Disabled
[I 2022-09-20 14:06:26.303 ServerApp] jupyterlab | extension was successfully linked.
[I 2022-09-20 14:06:26.314 ServerApp] nbclassic | extension was successfully linked.
[I 2022-09-20 14:06:26.315 ServerApp] Writing Jupyter server cookie secret to /home/glue_user/.local/share/jupyter/runtime/jupyter_cookie_secret
[I 2022-09-20 14:06:27.498 ServerApp] sparkmagic | extension was found and enabled by notebook_shim. Consider moving the extension to Jupyter Server's extension paths.
[I 2022-09-20 14:06:27.498 ServerApp] sparkmagic | extension was successfully linked.
[I 2022-09-20 14:06:27.498 ServerApp] notebook_shim | extension was successfully linked.
[W 2022-09-20 14:06:27.523 ServerApp] All authentication is disabled. Anyone who can connect to this server will be able to run code.
[I 2022-09-20 14:06:27.525 ServerApp] notebook_shim | extension was successfully loaded.
[I 2022-09-20 14:06:27.526 LabApp] JupyterLab extension loaded from /home/glue_user/.local/lib/python3.7/site-packages/jupyterlab
[I 2022-09-20 14:06:27.526 LabApp] JupyterLab application directory is /home/glue_user/.local/share/jupyter/lab
[I 2022-09-20 14:06:27.530 ServerApp] jupyterlab | extension was successfully loaded.
[I 2022-09-20 14:06:27.536 ServerApp] nbclassic | extension was successfully loaded.
[I 2022-09-20 14:06:27.536 ServerApp] sparkmagic extension enabled!
[I 2022-09-20 14:06:27.536 ServerApp] sparkmagic | extension was successfully loaded.
[I 2022-09-20 14:06:27.537 ServerApp] Serving notebooks from local directory: /home/glue_user/workspace/jupyter_workspace
[I 2022-09-20 14:06:27.537 ServerApp] Jupyter Server 1.18.1 is running at:
[I 2022-09-20 14:06:27.537 ServerApp] http://xxxxxxxxxxxxx:8888/lab
[I 2022-09-20 14:06:27.537 ServerApp] or http://127.0.0.1:8888/lab
[I 2022-09-20 14:06:27.537 ServerApp] Use Control-C to stop this server and shut down all kernels (twice to skip confirmation).
JupyterLab works fine. I can run this command and get result.
def retrieve_tables(database_name):
session = boto3.session.Session()
glue_client = session.client("glue")
response_get_tables = glue_client.get_tables(DatabaseName=database_name)
return response_get_tables
[table_dict["Name"] for table_dict in retrieve_tables("Name")["TableList"]]
Unfortunately, when I run this command, I'm getting an error.
An error was encountered:
An error occurred while calling o70.getCatalogSource. Trace:
py4j.Py4JException: Method getCatalogSource([class java.lang.String, class java.lang.String, class java.lang.String, class java.lang.String, class java.lang.String, class com.amazonaws.services.glue.util.JsonOptions, null]) does not exist
at py4j.reflection.ReflectionEngine.getMethod(ReflectionEngine.java:318)
at py4j.reflection.ReflectionEngine.getMethod(ReflectionEngine.java:326)
at py4j.Gateway.invoke(Gateway.java:274)
at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
at py4j.commands.CallCommand.execute(CallCommand.java:79)
at py4j.GatewayConnection.run(GatewayConnection.java:238)
at java.lang.Thread.run(Thread.java:750)
Traceback (most recent call last):
File "/home/glue_user/aws-glue-libs/PyGlue.zip/awsglue/dynamicframe.py", line 625, in from_catalog
return self._glue_context.create_dynamic_frame_from_catalog(db, table_name, redshift_tmp_dir, transformation_ctx, push_down_predicate, additional_options, catalog_id, **kwargs)
File "/home/glue_user/aws-glue-libs/PyGlue.zip/awsglue/context.py", line 177, in create_dynamic_frame_from_catalog
makeOptions(self._sc, additional_options), catalog_id),
File "/home/glue_user/spark/python/lib/py4j-0.10.9-src.zip/py4j/java_gateway.py", line 1305, in __call__
answer, self.gateway_client, self.target_id, self.name)
File "/home/glue_user/spark/python/pyspark/sql/utils.py", line 111, in deco
return f(*a, **kw)
File "/home/glue_user/spark/python/lib/py4j-0.10.9-src.zip/py4j/protocol.py", line 332, in get_return_value
format(target_id, ".", name, value))
py4j.protocol.Py4JError: An error occurred while calling o70.getCatalogSource. Trace:
py4j.Py4JException: Method getCatalogSource([class java.lang.String, class java.lang.String, class java.lang.String, class java.lang.String, class java.lang.String, class com.amazonaws.services.glue.util.JsonOptions, null]) does not exist
at py4j.reflection.ReflectionEngine.getMethod(ReflectionEngine.java:318)
at py4j.reflection.ReflectionEngine.getMethod(ReflectionEngine.java:326)
at py4j.Gateway.invoke(Gateway.java:274)
at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
at py4j.commands.CallCommand.execute(CallCommand.java:79)
at py4j.GatewayConnection.run(GatewayConnection.java:238)
at java.lang.Thread.run(Thread.java:750)
One more thing, I run this command spark.sql("show databases"). The error I am getting is
An error was encountered:
An error occurred while calling o77.toString. Trace:
java.lang.IllegalArgumentException: object is not an instance of declaring class
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
at py4j.Gateway.invoke(Gateway.java:282)
at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
at py4j.commands.CallCommand.execute(CallCommand.java:79)
at py4j.GatewayConnection.run(GatewayConnection.java:238)
at java.lang.Thread.run(Thread.java:750)
Traceback (most recent call last):
File "/home/glue_user/spark/python/pyspark/sql/session.py", line 723, in sql
return DataFrame(self._jsparkSession.sql(sqlQuery), self._wrapped)
File "/home/glue_user/spark/python/lib/py4j-0.10.9-src.zip/py4j/java_gateway.py", line 1305, in __call__
answer, self.gateway_client, self.target_id, self.name)
File "/home/glue_user/spark/python/pyspark/sql/utils.py", line 113, in deco
converted = convert_exception(e.java_exception)
File "/home/glue_user/spark/python/pyspark/sql/utils.py", line 86, in convert_exception
return AnalysisException(s.split(': ', 1)[1], stacktrace, c)
File "/home/glue_user/spark/python/pyspark/sql/utils.py", line 27, in __init__
self.cause = convert_exception(cause) if cause is not None else None
File "/home/glue_user/spark/python/pyspark/sql/utils.py", line 105, in convert_exception
return UnknownException(s, stacktrace, c)
File "/home/glue_user/spark/python/pyspark/sql/utils.py", line 27, in __init__
self.cause = convert_exception(cause) if cause is not None else None
File "/home/glue_user/spark/python/pyspark/sql/utils.py", line 98, in convert_exception
c.toString().startswith('org.apache.spark.api.python.PythonException: ')
File "/home/glue_user/spark/python/lib/py4j-0.10.9-src.zip/py4j/java_gateway.py", line 1305, in __call__
answer, self.gateway_client, self.target_id, self.name)
File "/home/glue_user/spark/python/pyspark/sql/utils.py", line 111, in deco
return f(*a, **kw)
File "/home/glue_user/spark/python/lib/py4j-0.10.9-src.zip/py4j/protocol.py", line 332, in get_return_value
format(target_id, ".", name, value))
py4j.protocol.Py4JError: An error occurred while calling o77.toString. Trace:
java.lang.IllegalArgumentException: object is not an instance of declaring class
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
at py4j.Gateway.invoke(Gateway.java:282)
at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
at py4j.commands.CallCommand.execute(CallCommand.java:79)
at py4j.GatewayConnection.run(GatewayConnection.java:238)
at java.lang.Thread.run(Thread.java:750)
I have few questions:
How I can check which role I'm using? For example, I have Glue role.
How I can reach out my catalog?
Can I use Kinesis stream on that Docker container? Right now, I'm using for that catalog "glueContext.create_data_frame.from_catalog".
Can I use "Magics" as I use that in Glue Studio Notebook an interactive session? I need "%connections" for RDS Aurora PostgreSQL or "%iam_role".
I am using Glue Docker image from https://aws.amazon.com/blogs/big-data/develop-and-test-aws-glue-version-3-0-jobs-locally-using-a-docker-container/ Here is my command to start JupyterLab (Windows):
I am getting this
JupyterLab works fine. I can run this command and get result.
Unfortunately, when I run this command, I'm getting an error.
the error I am getting is
One more thing, I run this command
spark.sql("show databases")
. The error I am getting isI have few questions: