awslabs / aws-glue-libs

AWS Glue Libraries are additions and enhancements to Spark for ETL operations.
Other
635 stars 300 forks source link

aws glue: snowflake connector is not downloading #145

Closed alagesann closed 2 years ago

alagesann commented 2 years ago

I am using Snowflake connector for aws glue. when I run the job it throws error as connector is not downloading.

I have setup following roles setup on the glue job

AmazonEC2ContainerRegistryFullAccess AmazonS3FullAccess AWSGlueServiceRole

but while job is running it throws following error:

2022-08-02 10:40:14,425 - main - INFO - Glue ETL Marketplace - Requesting ECR authorization token for registryIds=maskedid and region_name=us-east-1.

Traceback (most recent call last): File "/home/spark/.local/lib/python3.7/site-packages/urllib3/connection.py", line 160, in _new_conn (self._dns_host, self.port), self.timeout, *extra_kw File "/home/spark/.local/lib/python3.7/site-packages/urllib3/util/connection.py", line 84, in create_connection raise err File "/home/spark/.local/lib/python3.7/site-packages/urllib3/util/connection.py", line 74, in create_connection sock.connect(sa) socket.timeout: timed out     During handling of the above exception, another exception occurred:Traceback (most recent call last): File "/home/spark/.local/lib/python3.7/site-packages/botocore/httpsession.py", line 353, in send chunked=self._chunked(request.headers), File "/home/spark/.local/lib/python3.7/site-packages/urllib3/connectionpool.py", line 727, in urlopen method, url, error=e, _pool=self, _stacktrace=sys.exc_info()[2] File "/home/spark/.local/lib/python3.7/site-packages/urllib3/util/retry.py", line 386, in increment raise six.reraise(type(error), error, _stacktrace) File "/home/spark/.local/lib/python3.7/site-packages/urllib3/packages/six.py", line 735, in reraise raise value File "/home/spark/.local/lib/python3.7/site-packages/urllib3/connectionpool.py", line 677, in urlopen chunked=chunked, File "/home/spark/.local/lib/python3.7/site-packages/urllib3/connectionpool.py", line 381, in _make_request self._validate_conn(conn) File "/home/spark/.local/lib/python3.7/site-packages/urllib3/connectionpool.py", line 978, in _validate_conn conn.connect() File "/home/spark/.local/lib/python3.7/site-packages/urllib3/connection.py", line 309, in connect conn = self._new_conn() File "/home/spark/.local/lib/python3.7/site-packages/urllib3/connection.py", line 167, in _new_conn % (self.host, self.timeout),urllib3.exceptions.ConnectTimeoutError: (<botocore.awsrequest.AWSHTTPSConnection object at 0x7f4289911950>, 'Connection to api.ecr.us-east-1.amazonaws.com timed out. (connect timeout=60)')During handling of the above exception, another exception occurred:Traceback (most recent call last): File "/usr/lib64/python3.7/runpy.py", line 193, in _run_module_as_main "main", mod_spec) File "/usr/lib64/python3.7/runpy.py", line 85, in _run_code exec(code, run_globals) File "/tmp/aws_glue_custom_connector_python/docker/unpack_docker_image.py", line 361, in main() File "/tmp/aws_glue_custom_connector_python/docker/unpack_docker_image.py", line 351, in main res += download_jars_per_connection(conn, region, endpoint, proxy) File "/tmp/aws_glue_custom_connector_python/docker/unpack_docker_image.py", line 293, in download_jars_per_connection token = get_ecr_authorization_token(ecr_root) File "/tmp/aws_glue_custom_connector_python/docker/util.py", line 22, in wrapper return func(args, kwargs) File "/tmp/aws_glue_custom_connector_python/docker/unpack_docker_image.py", line 122, in get_ecr_authorization_token response = ecr.get_authorization_token(registryIds=[registry_id]) File "/home/spark/.local/lib/python3.7/site-packages/botocore/client.py", line 386, in _api_call return self._make_api_call(operation_name, kwargs) File "/home/spark/.local/lib/python3.7/site-packages/botocore/client.py", line 692, in _make_api_call operation_model, request_dict, request_context) File "/home/spark/.local/lib/python3.7/site-packages/botocore/client.py", line 711, in _make_request return self._endpoint.make_request(operation_model, request_dict) File "/home/spark/.local/lib/python3.7/site-packages/botocore/endpoint.py", line 102, in make_request return self._send_request(request_dict, operation_model) File "/home/spark/.local/lib/python3.7/site-packages/botocore/endpoint.py", line 137, in _send_request success_response, exception):   File "/home/spark/.local/lib/python3.7/site-packages/botocore/endpoint.py", line 256, in _needs_retry caught_exception=caught_exception, request_dict=request_dict) File "/home/spark/.local/lib/python3.7/site-packages/botocore/hooks.py", line 357, in emit return self._emitter.emit(aliased_event_name, kwargs) File "/home/spark/.local/lib/python3.7/site-packages/botocore/hooks.py", line 228, in emit return self._emit(event_name, kwargs) File "/home/spark/.local/lib/python3.7/site-packages/botocore/hooks.py", line 211, in _emit response = handler(**kwargs) File "/home/spark/.local/lib/python3.7/site-packages/botocore/retryhandler.py", line 183, in call if self._checker(attempts, response, caught_exception): File "/home/spark/.local/lib/python3.7/site-packages/botocore/retryhandler.py", line 251, in call caught_exception) File "/home/spark/.local/lib/python3.7/site-packages/botocore/retryhandler.py", line 277, in _should_retry return self._checker(attempt_number, response, caught_exception) File "/home/spark/.local/lib/python3.7/site-packages/botocore/retryhandler.py", line 317, in call caught_exception) File "/home/spark/.local/lib/python3.7/site-packages/botocore/retryhandler.py", line 223, in call attempt_number, caught_exception) File "/home/spark/.local/lib/python3.7/site-packages/botocore/retryhandler.py", line 359, in _check_caught_exception raise caught_exception File "/home/spark/.local/lib/python3.7/site-packages/botocore/endpoint.py", line 200, in _do_get_response http_response = self._send(request) File "/home/spark/.local/lib/python3.7/site-packages/botocore/endpoint.py", line 269, in _send return self.http_session.send(request) File "/home/spark/.local/lib/python3.7/site-packages/botocore/httpsession.py", line 377, in send raise ConnectTimeoutError(endpoint_url=request.url, error=e)botocore.exceptions.ConnectTimeoutError: Connect timeout on endpoint URL: "https://api.ecr.us-east-1.amazonaws.com/" Glue ETL Marketplace - failed to download connector, activation script exited with code 1 LAUNCH ERROR | Glue ETL Marketplace - failed to download connector.Please refer logs for details. Exception in thread "main" java.lang.Exception: Glue ETL Marketplace - failed to download connector. at com.amazonaws.services.glue.PrepareLaunch.downloadConnectorJar(PrepareLaunch.scala:876) at com.amazonaws.services.glue.PrepareLaunch.com$amazonaws$services$glue$PrepareLaunch$prepareCmd(PrepareLaunch.scala:667) at com.amazonaws.services.glue.PrepareLaunch$.main(PrepareLaunch.scala:44) at com.amazonaws.services.glue.PrepareLaunch.main(PrepareLaunch.scala)

I followed this blog:

https://aws.amazon.com/blogs/big-data/ingest-data-from-snowflake-to-amazon-s3-using-aws-glue-marketplace-connectors/

pls help to resolve this issue

moomindani commented 2 years ago

It seems you are getting timeout when downloading the connector library. The most typical cause is lacking network connectivity to the ECR repo. If your Glue job resides in private subnets, you will need to have NAT Gateway. For us-east-1, VPC Endpoint also works.

This post is for about different topic, but it will help you understand different options and pre-requisites. https://aws.amazon.com/blogs/big-data/part-1-integrate-apache-hudi-delta-lake-apache-iceberg-datasets-at-scale-aws-glue-studio-notebook/

BTW, this repo is for Glue ETL Python library. Your question is not related to this repo, and looks general question about Glue usage. For such case, we recommend asking in AWS Support, or AWS re:Post.