aws-samples / aws-glue-samples

AWS Glue code samples
MIT No Attribution
1.44k stars 820 forks source link

Connection to Redshift time's out from SageMaker jupyter's notebook #67

Closed arammaliachi closed 4 years ago

arammaliachi commented 4 years ago

I am trying to push data to Amazon Redshift as described in the join_and_relationalize.md sample.

The write operation works flawlessly if I run it within an AWS Glue job but it times out when I run it within an Amazon SageMaker Jupyter's PySpark notebook that I have connected to an AWS Glue Dev Endpoint (both use the same AWS Glue connection). I'm guessing it has to do with networking but the document doesn't specify anything about this.

Can AWS elaborate on how we must configure our Amazon SageMaker Jupyter's notebook in order to write to Amazon Redshift using PySpark?

An error was encountered:
An error occurred while calling o87.getJDBCSink.
: java.sql.SQLException: [Amazon](500150) Error setting/closing connection: Connection timed out.
    at com.amazon.redshift.client.PGClient.connect(Unknown Source)
    at com.amazon.redshift.client.PGClient.<init>(Unknown Source)
    at com.amazon.redshift.core.PGJDBCConnection.connect(Unknown Source)
    at com.amazon.jdbc.common.BaseConnectionFactory.doConnect(Unknown Source)
    at com.amazon.jdbc.common.AbstractDriver.connect(Unknown Source)
    at com.amazon.redshift.jdbc.Driver.connect(Unknown Source)
    at com.amazonaws.services.glue.util.JDBCWrapper$$anonfun$8.apply(JDBCUtils.scala:895)
    at com.amazonaws.services.glue.util.JDBCWrapper$$anonfun$8.apply(JDBCUtils.scala:891)
    at com.amazonaws.services.glue.util.JDBCWrapper$$anonfun$connectWithSSLAttempt$1$$anonfun$apply$6.apply(JDBCUtils.scala:847)
    at scala.Option.getOrElse(Option.scala:121)
    at com.amazonaws.services.glue.util.JDBCWrapper$$anonfun$connectWithSSLAttempt$1.apply(JDBCUtils.scala:847)
    at scala.Option.getOrElse(Option.scala:121)
    at com.amazonaws.services.glue.util.JDBCWrapper$.connectWithSSLAttempt(JDBCUtils.scala:847)
    at com.amazonaws.services.glue.util.JDBCWrapper$.connectionProperties(JDBCUtils.scala:890)
    at com.amazonaws.services.glue.util.JDBCWrapper.connectionProperties$lzycompute(JDBCUtils.scala:670)
    at com.amazonaws.services.glue.util.JDBCWrapper.connectionProperties(JDBCUtils.scala:670)
    at com.amazonaws.services.glue.util.JDBCWrapper.getRawConnection(JDBCUtils.scala:683)
    at com.amazonaws.services.glue.RedshiftDataSink.<init>(RedshiftDataSink.scala:40)
    at com.amazonaws.services.glue.GlueContext.getSink(GlueContext.scala:650)
    at com.amazonaws.services.glue.GlueContext.getJDBCSink(GlueContext.scala:463)
    at com.amazonaws.services.glue.GlueContext.getJDBCSink(GlueContext.scala:445)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:498)
    at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
    at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
    at py4j.Gateway.invoke(Gateway.java:282)
    at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
    at py4j.commands.CallCommand.execute(CallCommand.java:79)
    at py4j.GatewayConnection.run(GatewayConnection.java:238)
Caused by: com.amazon.support.exceptions.GeneralException: [Amazon](500150) Error setting/closing connection: Connection timed out.
    ... 31 more
Caused by: java.net.ConnectException: Connection timed out
    at sun.nio.ch.Net.connect0(Native Method)
    at sun.nio.ch.Net.connect(Net.java:454)
    at sun.nio.ch.Net.connect(Net.java:446)
    at sun.nio.ch.SocketChannelImpl.connect(SocketChannelImpl.java:645)
    at sun.nio.ch.SocketAdaptor.connect(SocketAdaptor.java:107)
    at com.amazon.redshift.client.PGClient.connect(Unknown Source)
    at com.amazon.redshift.client.PGClient.<init>(Unknown Source)
    at com.amazon.redshift.core.PGJDBCConnection.connect(Unknown Source)
    at com.amazon.jdbc.common.BaseConnectionFactory.doConnect(Unknown Source)
    at com.amazon.jdbc.common.AbstractDriver.connect(Unknown Source)
    at com.amazon.redshift.jdbc.Driver.connect(Unknown Source)
    at com.amazonaws.services.glue.util.JDBCWrapper$$anonfun$8.apply(JDBCUtils.scala:895)
    at com.amazonaws.services.glue.util.JDBCWrapper$$anonfun$8.apply(JDBCUtils.scala:891)
    at com.amazonaws.services.glue.util.JDBCWrapper$$anonfun$connectWithSSLAttempt$1$$anonfun$apply$6.apply(JDBCUtils.scala:847)
    at scala.Option.getOrElse(Option.scala:121)
    at com.amazonaws.services.glue.util.JDBCWrapper$$anonfun$connectWithSSLAttempt$1.apply(JDBCUtils.scala:847)
    at scala.Option.getOrElse(Option.scala:121)
    at com.amazonaws.services.glue.util.JDBCWrapper$.connectWithSSLAttempt(JDBCUtils.scala:847)
    at com.amazonaws.services.glue.util.JDBCWrapper$.connectionProperties(JDBCUtils.scala:890)
    at com.amazonaws.services.glue.util.JDBCWrapper.connectionProperties$lzycompute(JDBCUtils.scala:670)
    at com.amazonaws.services.glue.util.JDBCWrapper.connectionProperties(JDBCUtils.scala:670)
    at com.amazonaws.services.glue.util.JDBCWrapper.getRawConnection(JDBCUtils.scala:683)
    at com.amazonaws.services.glue.RedshiftDataSink.<init>(RedshiftDataSink.scala:40)
    at com.amazonaws.services.glue.GlueContext.getSink(GlueContext.scala:650)
    at com.amazonaws.services.glue.GlueContext.getJDBCSink(GlueContext.scala:463)
    at com.amazonaws.services.glue.GlueContext.getJDBCSink(GlueContext.scala:445)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:498)
    at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
    at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
    at py4j.Gateway.invoke(Gateway.java:282)
    at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
    at py4j.commands.CallCommand.execute(CallCommand.java:79)
    at py4j.GatewayConnection.run(GatewayConnection.java:238)
    at java.lang.Thread.run(Thread.java:748)

Traceback (most recent call last):
  File "/mnt/yarn/usercache/livy/appcache/application_1588365285924_0011/container_1588365285924_0011_01_000001/PyGlue.zip/awsglue/dynamicframe.py", line 665, in from_jdbc_conf
    redshift_tmp_dir, transformation_ctx)
  File "/mnt/yarn/usercache/livy/appcache/application_1588365285924_0011/container_1588365285924_0011_01_000001/PyGlue.zip/awsglue/context.py", line 311, in write_dynamic_frame_from_jdbc_conf
    catalog_id)
  File "/mnt/yarn/usercache/livy/appcache/application_1588365285924_0011/container_1588365285924_0011_01_000001/PyGlue.zip/awsglue/context.py", line 326, in write_from_jdbc_conf
    transformation_ctx, catalog_id)
  File "/mnt/yarn/usercache/livy/appcache/application_1588365285924_0011/container_1588365285924_0011_01_000001/py4j-0.10.7-src.zip/py4j/java_gateway.py", line 1257, in __call__
    answer, self.gateway_client, self.target_id, self.name)
  File "/mnt/yarn/usercache/livy/appcache/application_1588365285924_0011/container_1588365285924_0011_01_000001/pyspark.zip/pyspark/sql/utils.py", line 63, in deco
    return f(*a, **kw)
  File "/mnt/yarn/usercache/livy/appcache/application_1588365285924_0011/container_1588365285924_0011_01_000001/py4j-0.10.7-src.zip/py4j/protocol.py", line 328, in get_return_value
    format(target_id, ".", name), value)
py4j.protocol.Py4JJavaError: An error occurred while calling o87.getJDBCSink.
: java.sql.SQLException: [Amazon](500150) Error setting/closing connection: Connection timed out.
    at com.amazon.redshift.client.PGClient.connect(Unknown Source)
    at com.amazon.redshift.client.PGClient.<init>(Unknown Source)
    at com.amazon.redshift.core.PGJDBCConnection.connect(Unknown Source)
    at com.amazon.jdbc.common.BaseConnectionFactory.doConnect(Unknown Source)
    at com.amazon.jdbc.common.AbstractDriver.connect(Unknown Source)
    at com.amazon.redshift.jdbc.Driver.connect(Unknown Source)
    at com.amazonaws.services.glue.util.JDBCWrapper$$anonfun$8.apply(JDBCUtils.scala:895)
    at com.amazonaws.services.glue.util.JDBCWrapper$$anonfun$8.apply(JDBCUtils.scala:891)
    at com.amazonaws.services.glue.util.JDBCWrapper$$anonfun$connectWithSSLAttempt$1$$anonfun$apply$6.apply(JDBCUtils.scala:847)
    at scala.Option.getOrElse(Option.scala:121)
    at com.amazonaws.services.glue.util.JDBCWrapper$$anonfun$connectWithSSLAttempt$1.apply(JDBCUtils.scala:847)
    at scala.Option.getOrElse(Option.scala:121)
    at com.amazonaws.services.glue.util.JDBCWrapper$.connectWithSSLAttempt(JDBCUtils.scala:847)
    at com.amazonaws.services.glue.util.JDBCWrapper$.connectionProperties(JDBCUtils.scala:890)
    at com.amazonaws.services.glue.util.JDBCWrapper.connectionProperties$lzycompute(JDBCUtils.scala:670)
    at com.amazonaws.services.glue.util.JDBCWrapper.connectionProperties(JDBCUtils.scala:670)
    at com.amazonaws.services.glue.util.JDBCWrapper.getRawConnection(JDBCUtils.scala:683)
    at com.amazonaws.services.glue.RedshiftDataSink.<init>(RedshiftDataSink.scala:40)
    at com.amazonaws.services.glue.GlueContext.getSink(GlueContext.scala:650)
    at com.amazonaws.services.glue.GlueContext.getJDBCSink(GlueContext.scala:463)
    at com.amazonaws.services.glue.GlueContext.getJDBCSink(GlueContext.scala:445)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:498)
    at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
    at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
    at py4j.Gateway.invoke(Gateway.java:282)
    at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
    at py4j.commands.CallCommand.execute(CallCommand.java:79)
    at py4j.GatewayConnection.run(GatewayConnection.java:238)
Caused by: com.amazon.support.exceptions.GeneralException: [Amazon](500150) Error setting/closing connection: Connection timed out.
    ... 31 more
Caused by: java.net.ConnectException: Connection timed out
    at sun.nio.ch.Net.connect0(Native Method)
    at sun.nio.ch.Net.connect(Net.java:454)
    at sun.nio.ch.Net.connect(Net.java:446)
    at sun.nio.ch.SocketChannelImpl.connect(SocketChannelImpl.java:645)
    at sun.nio.ch.SocketAdaptor.connect(SocketAdaptor.java:107)
    at com.amazon.redshift.client.PGClient.connect(Unknown Source)
    at com.amazon.redshift.client.PGClient.<init>(Unknown Source)
    at com.amazon.redshift.core.PGJDBCConnection.connect(Unknown Source)
    at com.amazon.jdbc.common.BaseConnectionFactory.doConnect(Unknown Source)
    at com.amazon.jdbc.common.AbstractDriver.connect(Unknown Source)
    at com.amazon.redshift.jdbc.Driver.connect(Unknown Source)
    at com.amazonaws.services.glue.util.JDBCWrapper$$anonfun$8.apply(JDBCUtils.scala:895)
    at com.amazonaws.services.glue.util.JDBCWrapper$$anonfun$8.apply(JDBCUtils.scala:891)
    at com.amazonaws.services.glue.util.JDBCWrapper$$anonfun$connectWithSSLAttempt$1$$anonfun$apply$6.apply(JDBCUtils.scala:847)
    at scala.Option.getOrElse(Option.scala:121)
    at com.amazonaws.services.glue.util.JDBCWrapper$$anonfun$connectWithSSLAttempt$1.apply(JDBCUtils.scala:847)
    at scala.Option.getOrElse(Option.scala:121)
    at com.amazonaws.services.glue.util.JDBCWrapper$.connectWithSSLAttempt(JDBCUtils.scala:847)
    at com.amazonaws.services.glue.util.JDBCWrapper$.connectionProperties(JDBCUtils.scala:890)
    at com.amazonaws.services.glue.util.JDBCWrapper.connectionProperties$lzycompute(JDBCUtils.scala:670)
    at com.amazonaws.services.glue.util.JDBCWrapper.connectionProperties(JDBCUtils.scala:670)
    at com.amazonaws.services.glue.util.JDBCWrapper.getRawConnection(JDBCUtils.scala:683)
    at com.amazonaws.services.glue.RedshiftDataSink.<init>(RedshiftDataSink.scala:40)
    at com.amazonaws.services.glue.GlueContext.getSink(GlueContext.scala:650)
    at com.amazonaws.services.glue.GlueContext.getJDBCSink(GlueContext.scala:463)
    at com.amazonaws.services.glue.GlueContext.getJDBCSink(GlueContext.scala:445)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:498)
    at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
    at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
    at py4j.Gateway.invoke(Gateway.java:282)
    at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
    at py4j.commands.CallCommand.execute(CallCommand.java:79)
    at py4j.GatewayConnection.run(GatewayConnection.java:238)
    at java.lang.Thread.run(Thread.java:748)
moomindani commented 4 years ago

Thank you for reporting the issue. I guess it occurred due to network configuration between Glue dev endpoint and Redshift cluster.

You can narrow down the issue by;

Since we would like you to ask this kind of troubleshooting question in AWS Glue Forum or AWS Support instead of reporting as GitHub issues, I am closing this issue for now.