AlexsLemonade / scpca-portal

Single-cell Pediatric Cancer Atlas Portal is a growing database of uniformly processed single-cell data from pediatric cancer tumors and model systems
https://scpca.alexslemonade.org
BSD 3-Clause "New" or "Revised" License
3 stars 0 forks source link

Move loading of metadata in load_data to its own command #840

Open avrohomgottlieb opened 1 month ago

avrohomgottlieb commented 1 month ago

Context

To start off the Batch epic, the first step will be to move the loading of metadata out of the load_data command and its into own management command. This will be an important step towards decoupling the loading of metadata from the generation of computed files, which will afford us more fine grained control as we begin to integrate Batch.

Problem or Idea, and Next Steps

As a general strategy, to enforce the decoupling of metadata loading from file computation would be to break up the current load_data into two commands:

In order to best facilitate the decoupling of load_metadata and generated_computed_files, there should not be any shared inputs or outputs between them. The way that generated_computed_files will know for which projects to create computed files for will be by querying the Project model for all projects that do not have any ComputedFiles associated with them. These projects can then be iterated over them and dispatched one by one to Batch.

avrohomgottlieb commented 4 weeks ago

A suite of integration tests called test_load_metadata should be added as well.