ThiagoPanini / terraglue

Providing an easy way to deploy a Glue job in any AWS account using Terraform
https://terraglue.readthedocs.io/en/latest/
23 stars 3 forks source link
analytics aws glue pyspark python spark terraform

terraglue-logo

![GitHub release (latest by date)](https://img.shields.io/github/v/release/ThiagoPanini/terraglue?color=purple) ![GitHub Last Commit](https://img.shields.io/github/last-commit/ThiagoPanini/terraglue?color=purple) ![CI workflow](https://img.shields.io/github/actions/workflow/status/ThiagoPanini/terraglue/ci-main.yml?label=ci) [![codecov](https://codecov.io/gh/ThiagoPanini/terraglue/branch/main/graph/badge.svg?token=7HI1YGS4AA)](https://codecov.io/gh/ThiagoPanini/terraglue) [![Documentation Status](https://readthedocs.org/projects/terraglue/badge/?version=latest)](https://terraglue.readthedocs.io/pt/latest/?badge=latest)

Table of Contents


What is terraglue?

Hi everyone! Welcome to the official documentation page for terraglue, an open source Terraform module developed in order to provide an easy way to deploy a Glue job in any AWS account.

Note Now the terraglue project has an official documentation in readthedocs! Visit the following link and check out usability technical details, practical examples and more!

Features


How Does it Work?

When terraglue module is called in a Terraform project, an operation mode must be chosen. There are two options: "learning" mode and "production" mode. According to this decision, different things can happen in the target AWS account.

The learning mode helps users to understand more about Glue jobs on AWS by providing a complete example with all resources needed to start exploring Glue. It works as following:

🤖 Learning mode

  1. A sample pyspark application is uploaded in a given S3 bucket to be the main script for the Glue job
  2. An auxiliar python file is also uploaded in S3 with useful transformation functions for the job
  3. An IAM role is created with basic permissions to run a Glue job
  4. A KMS key is created to be used in the job security configuration
  5. Finally, a preconfigured Glue job is deployed in order to provide users a example of a SoT table creation using Brazilian E-Commerce data from datadelivery

By the other hand, the production mode enables users to configure and deploy their own Glue jobs in AWS. The under the hood operation depends on how users configure variables on module call. In summary, it works as following:

🚀 Production mode

  1. In this mode, users have the chance to use all the terraglue module variables to customize the deploy
  2. A custom Glue job is deployed in the target AWS account using the variables passed by users on module call

Combining Solutions

The terraglue Terraform module isn't alone. There are other complementary open source solutions that can be put together to enable the full power of learning analytics on AWS. Check it out if you think they could be useful for you!

A diagram showing how its possible to use other solutions such as datadelivery, terraglue and sparksnake


References

AWS Glue

Terraform

Apache Spark

GitHub

Docker

Testes

Outros