This epic is meant to track the refactoring of the load_data management command and it's related functionality. The goal of this refactoring is to better accommodate separation of concerns between metadata extraction and file computation, and to prepare us for the upcoming integration of AWS Batch (for file computation).
Goal
We want the end result of the changes implemented during this epic to result in the following:
Break up code along behavior
Move similar code together
Allow for different workflows for different environments / command arguments
Notes on Approaches
Realistically this will be a combination of two approaches.
Option 1 "Inheritance":
Ex:
class ProjectLoader(models.Model):
"""Common attributes for Project, Sample, and ComputedFile models."""
class Meta:
abstract = True
def load_data():
pass
class Project(ProjectLoader):
pass
# load_data.py
class Command:
def run():
Project.load_data()
Option 2 Modularity
class Loader():
@download_if_missing
def load_project():
pass
# management commnand
# another class that does the actual thing
class Command():
def run():
# handle inputs
Loader.reload_project()
Loader.reload_sample()
Loader.load_project()
Loader.load_sample()
Loader.purge_project()
Directory Structure
Services/
|------ FileSystem (setup work dir / clean up work dir)
|------ Downloader (downloading data from s3)
|------ Metadata (get downloaded files -> create new combined metadata)
|------ Factory (create and save projects/ samples from metadata)
|------ Archiver (create zip files from a project / sample in the database)
Command Workflow
LoadDataCommand // Runs on API (fast) / updates database
Purge.delete_project_from_rds() / remove from db
Purge.delete_project_from_s3() / remove from aws
Purge.purge_project() (remove from db / remove computed files on aws)
FileSystem.clean_inputs()
Downloader.download_metadata()
Metadata.combine_metadata()
Factory.create_project() / Factory.create_sample() / set status to 'pending'
FileSystem.clean_all()
CreateZipCommands // Runs on Batch (slow) / updates s3
FileSystem.clean_inputs()
Downloader.download_data()
Metadata.generate_metadata_files()
Readme.generate_project_readmes()
Archiver.zip_project() / generates zip on local disk
Factory.create_computed_file() / creates representation for the computed file
Uploader.upload_project() / uploads computed file to s3
Context
This epic is meant to track the refactoring of the
load_data
management command and it's related functionality. The goal of this refactoring is to better accommodate separation of concerns between metadata extraction and file computation, and to prepare us for the upcoming integration of AWS Batch (for file computation).Goal
We want the end result of the changes implemented during this epic to result in the following:
Notes on Approaches
Realistically this will be a combination of two approaches.
Option 1 "Inheritance":
Ex:
Option 2 Modularity
Directory Structure
Services/ |------ FileSystem (setup work dir / clean up work dir) |------ Downloader (downloading data from s3) |------ Metadata (get downloaded files -> create new combined metadata) |------ Factory (create and save projects/ samples from metadata) |------ Archiver (create zip files from a project / sample in the database)
Command Workflow
LoadDataCommand // Runs on API (fast) / updates database
CreateZipCommands // Runs on Batch (slow) / updates s3