Hi everyone! Welcome to the official documentation page for terraglue, an open source Terraform module developed in order to provide an easy way to deploy a Glue job in any AWS account.
Note Now the terraglue project has an official documentation in readthedocs! Visit the following link and check out usability technical details, practical examples and more!
When terraglue module is called in a Terraform project, an operation mode must be chosen. There are two options: "learning" mode and "production" mode. According to this decision, different things can happen in the target AWS account.
The learning mode helps users to understand more about Glue jobs on AWS by providing a complete example with all resources needed to start exploring Glue. It works as following:
🤖 Learning mode
- A sample pyspark application is uploaded in a given S3 bucket to be the main script for the Glue job
- An auxiliar python file is also uploaded in S3 with useful transformation functions for the job
- An IAM role is created with basic permissions to run a Glue job
- A KMS key is created to be used in the job security configuration
- Finally, a preconfigured Glue job is deployed in order to provide users a example of a SoT table creation using Brazilian E-Commerce data from datadelivery
By the other hand, the production mode enables users to configure and deploy their own Glue jobs in AWS. The under the hood operation depends on how users configure variables on module call. In summary, it works as following:
🚀 Production mode
- In this mode, users have the chance to use all the terraglue module variables to customize the deploy
- A custom Glue job is deployed in the target AWS account using the variables passed by users on module call
The terraglue Terraform module isn't alone. There are other complementary open source solutions that can be put together to enable the full power of learning analytics on AWS. Check it out if you think they could be useful for you!
AWS Glue
Terraform
Apache Spark
GitHub
Docker
Testes
Outros