ExpediaGroup / apiary-data-lake

Terraform scripts for deploying Apiary Data Lake
https://github.com/ExpediaGroup/apiary
Apache License 2.0
19 stars 31 forks source link
apiary datalake

Overview

This repo contains a Terraform module to deploy the Apiary data lake component. The module deploys various stateful components in a typical Hadoop-compatible data lake in AWS.

For more information please refer to the main Apiary project page.

Architecture

Datalake  architecture

Key Features

Variables

Please refer to VARIABLES.md.

Usage

NB: This module currently requires you to use it from a machine with bash, aws, mysql, and jq CLI tools installed.

Example module invocation:

module "apiary" {
  source                   = "git::https://github.com/ExpediaGroup/apiary-data-lake.git"
  aws_region               = "us-west-2"
  instance_name            = "test"
  apiary_tags              = "${var.tags}"
  apiary_extra_tags_s3     = "${var.extra_tags_s3}"
  private_subnets          = ["subnet1", "subnet2", "subnet3"]
  vpc_id                   = "vpc-123456"
  hms_docker_image         = "${aws_account}.dkr.ecr.${aws_region}.amazonaws.com/apiary-metastore"
  hms_docker_version       = "1.0.0"
  hms_ro_cpu               = "2048"
  hms_rw_cpu               = "2048"
  hms_ro_heapsize          = "8192"
  hms_rw_heapsize          = "8192"
  apiary_log_bucket        = "s3-logs-bucket"
  db_instance_class        = "db.t2.medium"
  db_backup_retention      = "7"
  apiary_managed_schemas   = [
    {
        schema_name = "db1",
        s3_lifecycle_policy_transition_period = "30"
    },
    {
        schema_name = "db_2",
        s3_storage_class = "INTELLIGENT_TIERING"
    },
    {
        schema_name = "secure_db",
        encryption   = "aws:kms" //supported values for encryption are AES256,aws:kms
        admin_roles = "role1_arn,role2_arn" //kms key management will be restricted to these roles.
        client_roles = "role3_arn,role4_arn" //s3 bucket read/write and kms key usage will be restricted to these roles.
        customer_accounts = "account_id1,account_id2" //this will override module level apiary_customer_accounts
    }
  ]
  apiary_customer_accounts = ["aws_account_no_1", "aws_account_no_2"]
  # single policy with multiple conditions will use AND operator
  # https://docs.aws.amazon.com/IAM/latest/UserGuide/reference_policies_multi-value-conditions.html
  # ; will create seperate policies for each condition, essentially to enable OR operator
  apiary_customer_condition = <<EOF
    "ForAnyValue:StringEquals": {"s3:ExistingObjectTag/security": [ "public"] };
    "StringLike": {"s3:ExistingObjectTag/type": "image*" }
  EOF
  ingress_cidr             = ["10.0.0.0/8"]
  apiary_assume_roles      = [
    {
        name = "client_name"
        principals = [ "arn:aws:iam::account_number:role/cross-account-role" ]
        schema_names = [ "dm","lz","test_1" ]
        max_role_session_duration_seconds = "7200",
        allow_cross_region_access = true 
    }
  ]
}

Notes

The Apiary metastore Docker image is not yet published to a public repository, you can build from this repo and then publish it to your own ECR.

In k8s deployment mode IAM roles can be attached to metastore pods either using IRSA or KIAM, module will use IRSA when oidc_provider variable is configured, will use Kiam whne kiam_arn variable is configured.

Contact

Mailing List

If you would like to ask any questions about or discuss Apiary please join our mailing list at

https://groups.google.com/forum/#!forum/apiary-user

Legal

This project is available under the Apache 2.0 License.

Copyright 2018-2019 Expedia, Inc.