To start off the Batch epic, the first step will be to move the loading of metadata out of the load_data command and its into own management command. This will be an important step towards decoupling the loading of metadata from the generation of computed files, which will afford us more fine grained control as we begin to integrate Batch.
Problem or Idea, and Next Steps
As a general strategy, to enforce the decoupling of metadata loading from file computation would be to break up the current load_data into two commands:
load_metadata
Which downloads all metadata files, creates Project objects, and iterates over them, calling Project::load_metadata on each project in the process. This management command will also perform a series of preliminary checks to verify whether or not a project and its metadata can be loaded successfully.
dispatch_computed_files_generation
Which will iterate over a list of project objects and dispatch them to Batch one by one. The question of Batch's exact entrypoint (a third command called generate_computed_files) will be described in a different issue.
In order to best facilitate the decoupling of load_metadata and generated_computed_files, there should not be any shared inputs or outputs between them. The way that generated_computed_files will know for which projects to create computed files for will be by querying the Project model for all projects that do not have any ComputedFiles associated with them. These projects can then be iterated over them and dispatched one by one to Batch.
Context
To start off the Batch epic, the first step will be to move the loading of metadata out of the
load_data
command and its into own management command. This will be an important step towards decoupling the loading of metadata from the generation of computed files, which will afford us more fine grained control as we begin to integrate Batch.Problem or Idea, and Next Steps
As a general strategy, to enforce the decoupling of metadata loading from file computation would be to break up the current
load_data
into two commands:load_metadata
Project
objects, and iterates over them, callingProject::load_metadata
on each project in the process. This management command will also perform a series of preliminary checks to verify whether or not a project and its metadata can be loaded successfully.dispatch_computed_files_generation
entrypoint
(a third command calledgenerate_computed_files
) will be described in a different issue.In order to best facilitate the decoupling of
load_metadata
andgenerated_computed_files
, there should not be any shared inputs or outputs between them. The way thatgenerated_computed_files
will know for which projects to create computed files for will be by querying theProject
model for all projects that do not have anyComputedFiles
associated with them. These projects can then be iterated over them and dispatched one by one to Batch.