awslabs / aws-glue-libs

AWS Glue Libraries are additions and enhancements to Spark for ETL operations.
Other
641 stars 303 forks source link

Is it possible to add setup instructions for using AWS Marketplace connectors to use with these Glue images? #132

Closed JeevaTM closed 2 years ago

JeevaTM commented 2 years ago

I am using Iceberg Connector for Glue 3.0 on AWS Glue. Obviously works as expected, i.e., reads Iceberg table, write to Iceberg table, etc.

I created a new connection using Iceberg Connector for Glue 3.0 and added it in AWS Glue Job details under connections.

How do you do that in local Glue image?

Checking the AWS Glue for previous run logs, it does hints towards what is happening in the background

Glue ETL Marketplace - downloading jars for following connections: List(not_titanic) using command: List(python3, -u, -m, docker.unpack_docker_image, --connections, new_con, --result_path, jar_paths, --region, us-east-1, --endpoint, https://glue.us-east-1.amazonaws.com)

2022-04-19 12:06:50,562 - __main__ - INFO - Glue ETL Marketplace - Requesting ECR authorization token for registryIds=<id>and region_name=us-east-1.

2022-04-19 12:06:50,597 - __main__ - INFO - Glue ETL Marketplace - Calling ECR HTTP API to get manifest of https://<id>.dkr.ecr.us-east-1.amazonaws.com/amazon-web-services/glue/iceberg:0.12.0-glue3.0-2.

2022-04-19 12:06:50,724 - __main__ - INFO - Glue ETL Marketplace - Download/unpacking sha256:<> layer of image: https://<id>.dkr.ecr.us-east-1.amazonaws.com/amazon-web-services/glue/iceberg:0.12.0-glue3.0-2.

2022-04-19 12:06:50,724 - __main__ - INFO - Glue ETL Marketplace - Preparing layer url and gz file path to store layer <>.

2022-04-19 12:06:50,724 - __main__ - INFO - Glue ETL Marketplace - Getting the layer file <> and store it as gz.

2022-04-19 12:06:52,708 - __main__ - INFO - Glue ETL Marketplace - Unarchiving <> layer as tar file.

2022-04-19 12:06:53,008 - __main__ - INFO - Glue ETL Marketplace - run_commands output - "tar -C EVY4R/layers/tar/ -xf EVY4R/layers/tar/<>"

2022-04-19 12:06:58,568 - __main__ - INFO - Glue ETL Marketplace - Container paths are: ['/tmp/aws_glue_custom_connector_python/EVY4R/layers/tar/jars/iceberg-spark3-runtime-0.12.0.jar', '/tmp/aws_glue_custom_connector_python/EVY4R/layers/tar/jars/url-connection-client-2.15.40.jar', '/tmp/aws_glue_custom_connector_python/EVY4R/layers/tar/jars/bundle-2.15.40.jar']

2022-04-19 12:06:58,568 - __main__ - INFO - Glue ETL Marketplace - collected jar paths: ['/tmp/aws_glue_custom_connector_python/EVY4R/layers/tar/jars/iceberg-spark3-runtime-0.12.0.jar', '/tmp/aws_glue_custom_connector_python/EVY4R/layers/tar/jars/url-connection-client-2.15.40.jar', '/tmp/aws_glue_custom_connector_python/EVY4R/layers/tar/jars/bundle-2.15.40.jar'] for connection: new_con

2022-04-19 12:06:58,568 - __main__ - INFO - Glue ETL Marketplace - successfully wrote jar paths to "jar_paths"

2022-04-19 12:06:58,568 - __main__ - INFO - Glue ETL Marketplace - successfully wrote jar paths to "jar_paths"
Glue ETL Marketplace - copying /tmp/aws_glue_custom_connector_python/EVY4R/layers/tar/jars/iceberg-spark3-runtime-0.12.0.jar to /tmp/
Glue ETL Marketplace - copying /tmp/aws_glue_custom_connector_python/EVY4R/layers/tar/jars/url-connection-client-2.15.40.jar to /tmp/
Glue ETL Marketplace - copying /tmp/aws_glue_custom_connector_python/EVY4R/layers/tar/jars/bundle-2.15.40.jar to /tmp/

Glue ETL Marketplace - ETL connector activation process finished, container setup continues...

Is it possible to add some sort of docker-compose.yml with Glue image as one service and connector as another service? Making life easier to work with instead of fumbling through logs

moomindani commented 2 years ago

Currently we do not have formal instructions to set up the marketplace connectors for the Docker image.

Here's couple of available options:

For the second one, this blog post has appendix to set up extra library dependencies. https://aws.amazon.com/blogs/big-data/develop-and-test-aws-glue-version-3-0-jobs-locally-using-a-docker-container/ Hope it helps.