AllenCellModeling / long-term-eng

Long Term Engineering + Infrastructure Projects
1 stars 0 forks source link

Rewrite Single Cell Processing Pipeline #1

Closed evamaxfield closed 3 years ago

evamaxfield commented 4 years ago

Use Case

Please provide a use case to help us understand your request in context We have progressed pretty far in our tech stack lately and the major pipeline that produces products for cell feature explorer doesn't utilize any of them. Bring that pipeline up to date and where possible, add more to it. In the coming year, more cells can be successfully segmented and new features from these cells and its subcellular structures will be calculated. Rewriting the single cell processing pipeline will enable storage and sharing of these important metrics with the research community - see here for the current dataset.

Solution

Please describe your ideal solution Use cookiecutter-stepworkflow to rewrite the pipeline and as a part of that start using dask for processing instead of the oddly configured sbatch scripts we currently use.

Alternatives

Please describe any alternatives you've considered, even if you've dismissed them At a bare minimum use prefect and dask, but I think this would be an ideal candidate for the first fully fledged step_workflow pipeline.

Stakeholders

Please add any individual person or team's that should be brought in for discussion on the project Greg, Dan, Gabe

Major Components

Please add any major components that need to be done for this project [ ] Make a data+process visual diagram of the complete single-cell processing pipeline [ ] Rewrite using stepworkflow [ ] Handle any new segmentation algorithm changes [ ] Add FNET training and application task [ ] Add web ready file processing (thumbnails, etc) [ ] Update produced Quilt dataset to include all new files [ ] Potentially: Send CFE data to Firebase for new database instead of JSON [ ] Potentially: Add Mitosis Classifier application [ ] Potentially: Export the table of cells by features to as a Google BigQuery Table

Dependencies

Please add any other major or minor project dependencies here [ ] (optional) Rewrite FNET Library [ ] (optional) AICS S3 Storage Centralization

Other Notes

Please add any extra notes here

evamaxfield commented 4 years ago

Updates to this: see actk which is a datastep workflow for going from raw FOVs to the basis for the single cell dataset with features and single cell images