HumanCellAtlas / upload-service

DCP Ingestion File Upload Service
MIT License
4 stars 4 forks source link

Data Coordination Platform, Upload Service

Build Status Maintainability Test Coverage Organized by ZenHub

Overview

The DCP Upload Service provides a file staging and validation facility for the DCP. Upload Areas are created/deleted using a REST API, which is secured so only the DCP Ingestion Service may use it. Files are staged into AWS S3 using the HCA CLI, where the Upload Service then computes checksums for them. The validation service runs Docker images against files.

Components

upload-api

Is a Lambda Chalice/Connexion/Flask app that presents the Upload Service REST API. The API is defined using an OpenAPI 2.0 Specification (Swagger) in config/upload-api.yml.

upload-queue

A SQS that receives messages on S3 ObjectCreated events and then triggers the upload checksum lambda function

upload-checksum-daemon

Is a lambda function triggered by SQS events that computes checksums for uploaded files.

Validation Batch Service

Is an AWS Batch installation

System Architecture Diagram

<img src="images/upload_service_architecture.png" alt="Upload Service System Architecture Diagram" title="Upload Service System Architecture Diagram" width="100%" height="100%" />

Image by Sam Pierson

Development Environment Setup

Prerequisites

Check out the upload service repo:

# IMPORTANT use --recursive
git clone --recursive git@github.com:HumanCellAtlas/upload-service.git
cd upload-service

Install packages. I use virtualenv, but you don’t have to. This is what it looks for me:

mkdir venv  # I have venv/ in my global .gitignore
virtualenv --python python3.6 venv/36
source venv/36/bin/activate
pip install -r requirements-dev.txt

Do this once:

cp config/environment.dev.example config/environment.dev

Then edit as necessary.

Running Tests

export AWS_PROFILE=hca
source config/environment
make test

Running Tests Offline

Tests may also be run offline if you have a PostgreSQL server running locally. You must setup an upload_local postgres database first, e.g.:

brew install postgres
# follow instructions to start postgres server
createuser
createdb upload_local
DEPLOYMENT_STAGE=local make db/migrate

To run tests offline use the local environment:

export DEPLOYMENT_STAGE=local
source config/environment
make test

Running Locally

source config/environment
scripts/upload-api

Running Functional Tests

export DEPLOYMENT_STAGE=dev
source config/environment
make functional-tests 

Deployment/Release Process

Deployment is typically performed by Gitlab. The full instructions on how to deploy to each environment (i.e. integration, prod, etc.) can be found here.

To manually deploy to e.g. the staging deployment:

export enc_password="<password-used-to-encrypt-deployment-secrets>"
scripts/deploy.sh staging

Validation Deployment

UNDER CONSTRUCTION - NOTHING TO SEE HERE

Prerequisites

Do It

scripts/batchctl.py staging setup

cd validation/docker-images/base-alpine-python36
make release