NASA-IMPACT / Blaze-Framework

Repository to curate transfer workflows to synchronize data between ESA and NASA HLS data repositories.
Apache License 2.0
4 stars 5 forks source link

Blaze: A High-Performance, Scalable, and Efficient Data Transfer Framework with Configurable and Extensible Features

Transfer Architecture

The diagram below shows the top level transfer architecture of Blaze to move data from a S3 bucket to any cloud resource endpoint

Transfer Architecture

Overview

Blaze is a product with the integration of several software tools to build end-to-end data transfer workflows which includes data preparation, transfers and post-processing. To accomplish that, Blaze uses following software

  1. Apache Airavata MFT - This is the data transfer engine of Blaze. Airavata MFT provides a highly scalable and performing agent based data transfer platform. Airavata MFT supports many cloud and legacy data transfer protocols and inter-protocol data translations which provide the capability to transfer data seamlessly between different storage types
  2. Apache Airflow - Blaze uses Airflow as the orchestration framework which executes the end to end data transfer workflow. Blaze provides example workflows with integration between MFT and Transfer Catalog and users have the liberty to update the workflow code according to their requirements

    Deployment

Deployment of the platform includes installation of the Data Transfer Layer mentioned in the above diagram and registering source and destination storages in the Data Transfer Layer

Prerequisites to deploy on EC2

To have a minimal transfer layer, there should be 2 EC2 VMs (one for MFT Master and one for MFT Agent) created inside us-west-2 region. These VMs should have following features

Installation

Security Groups

cd scripts
python3 -m venv ENV
source ENV/bin/activate
pip install -r requirements.txt
ansible-playbook -i inventories/example install-master.yml

Example storage creation command for AWS S3 cloud storage in us-west-2 region

java -jar /home/ubuntu/mft_deployment/mft-client.jar s3 remote add --name=<Name for the Storage> --bucket=<Bucket Name> --key=<key> --secret=<secret> --endpoint=https://s3.us-west-2.amazonaws.com --region=us-west-2

Example storage creation command for IBM cloud storage in us-east region

java -jar /home/ubuntu/mft_deployment/mft-client.jar s3 remote add --name=<Name for the Storage> --bucket=<Bucket Name> --key=<key> --secret=<secret>  --endpoint=https://s3.us-east.cloud-object-storage.appdomain.cloud --region=us-east-smart

Transfer Workflow