awslabs / aws-glue-libs

AWS Glue Libraries are additions and enhancements to Spark for ETL operations.
Other
635 stars 300 forks source link

Wrong Glue endpoint for China Region #116

Open kynging opened 2 years ago

kynging commented 2 years ago

When I try to invoke glueContext.create_dynamic_frame.from_catalog I get the following error. Apparantly the Java SDK used the wrong endpoint glue.cn-northwest-1.amazonaws.com missing ".cn"

Traceback (most recent call last): File "/tmp/pycharm_project_550/glue/test.py", line 16, in datasource0 = glueContext.create_dynamic_frame.from_catalog(database = "default", table_name = "log") File "/home/ec2-user/aws-glue-libs/PyGlue.zip/awsglue/dynamicframe.py", line 625, in from_catalog File "/home/ec2-user/aws-glue-libs/PyGlue.zip/awsglue/context.py", line 177, in create_dynamic_frame_from_catalog File "/home/ec2-user/spark/python/lib/py4j-0.10.9-src.zip/py4j/java_gateway.py", line 1305, in call File "/home/ec2-user/spark/python/lib/pyspark.zip/pyspark/sql/utils.py", line 111, in deco File "/home/ec2-user/spark/python/lib/py4j-0.10.9-src.zip/py4j/protocol.py", line 328, in get_return_value py4j.protocol.Py4JJavaError: An error occurred while calling o27.getCatalogSource. : com.amazonaws.SdkClientException: Unable to execute HTTP request: glue.cn-northwest-1.amazonaws.com at com.amazonaws.http.AmazonHttpClient$RequestExecutor.handleRetryableException(AmazonHttpClient.java:1207) at com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeHelper(AmazonHttpClient.java:1153) at com.amazonaws.http.AmazonHttpClient$RequestExecutor.doExecute(AmazonHttpClient.java:802) at com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeWithTimer(AmazonHttpClient.java:770) at com.amazonaws.http.AmazonHttpClient$RequestExecutor.execute(AmazonHttpClient.java:744) at com.amazonaws.http.AmazonHttpClient$RequestExecutor.access$500(AmazonHttpClient.java:704) at com.amazonaws.http.AmazonHttpClient$RequestExecutionBuilderImpl.execute(AmazonHttpClient.java:686) at com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:550) at com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:530) at com.amazonaws.services.glue.AWSGlueClient.doInvoke(AWSGlueClient.java:11244) at com.amazonaws.services.glue.AWSGlueClient.invoke(AWSGlueClient.java:11211) at com.amazonaws.services.glue.AWSGlueClient.invoke(AWSGlueClient.java:11200) at com.amazonaws.services.glue.AWSGlueClient.executeGetTable(AWSGlueClient.java:6691) at com.amazonaws.services.glue.AWSGlueClient.getTable(AWSGlueClient.java:6660) at com.amazonaws.services.glue.util.DataCatalogWrapper.$anonfun$getTable$2(DataCatalogWrapper.scala:163) at com.amazonaws.services.glue.util.ErieRetryWrapper$.$anonfun$executeWithRetry$1(DataCatalogWrapper.scala:974) at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23) at scala.util.Try$.apply(Try.scala:213) at com.amazonaws.services.glue.util.ErieRetryWrapper$.executeWithRetry(DataCatalogWrapper.scala:974) at com.amazonaws.services.glue.util.DataCatalogWrapper.$anonfun$getTable$1(DataCatalogWrapper.scala:162) at scala.util.Try$.apply(Try.scala:213) at com.amazonaws.services.glue.util.DataCatalogWrapper.getTable(DataCatalogWrapper.scala:140) at com.amazonaws.services.glue.GlueContext.getCatalogSource(GlueContext.scala:199) at com.amazonaws.services.glue.GlueContext.getCatalogSource(GlueContext.scala:181) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244) at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357) at py4j.Gateway.invoke(Gateway.java:282) at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132) at py4j.commands.CallCommand.execute(CallCommand.java:79) at py4j.GatewayConnection.run(GatewayConnection.java:238) at java.lang.Thread.run(Thread.java:748) Caused by: java.net.UnknownHostException: glue.cn-northwest-1.amazonaws.com at java.net.InetAddress.getAllByName0(InetAddress.java:1281) at java.net.InetAddress.getAllByName(InetAddress.java:1193) at java.net.InetAddress.getAllByName(InetAddress.java:1127) at com.amazonaws.SystemDefaultDnsResolver.resolve(SystemDefaultDnsResolver.java:27) at com.amazonaws.http.DelegatingDnsResolver.resolve(DelegatingDnsResolver.java:38) at org.apache.http.impl.conn.DefaultHttpClientConnectionOperator.connect(DefaultHttpClientConnectionOperator.java:112) at org.apache.http.impl.conn.PoolingHttpClientConnectionManager.connect(PoolingHttpClientConnectionManager.java:374) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at com.amazonaws.http.conn.ClientConnectionManagerFactory$Handler.invoke(ClientConnectionManagerFactory.java:76) at com.amazonaws.http.conn.$Proxy14.connect(Unknown Source) at org.apache.http.impl.execchain.MainClientExec.establishRoute(MainClientExec.java:393) at org.apache.http.impl.execchain.MainClientExec.execute(MainClientExec.java:236) at org.apache.http.impl.execchain.ProtocolExec.execute(ProtocolExec.java:186) at org.apache.http.impl.client.InternalHttpClient.doExecute(InternalHttpClient.java:185) at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:83) at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:56) at com.amazonaws.http.apache.client.impl.SdkHttpClient.execute(SdkHttpClient.java:72) at com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeOneRequest(AmazonHttpClient.java:1331) at com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeHelper(AmazonHttpClient.java:1145) ... 33 more

Process finished with exit code 1

QiaoLiar commented 7 months ago

Hi @kynging ,

You can use this tool(https://leonardosnt.github.io/jar-string-editor/?ref=gh) to change the string ".amazonaws.com" in AWSGlueETL-4.0.0.jar to ".amazonaws.com.cn". Then replace the original jar package with the new one.

Because I found that AWSGlueETL-4.0.0.jar doesn't use aws sdk's interface to get the corresponding service endpoint, instead, it splices the endpoint url with the global domain directly in the EndpointConfig class.

String endpoint = (new StringBuilder(27)).append("https://glue.").append(region).append(".amazonaws.com").toString();