Please provide a use case to help us understand your request in context
We have progressed pretty far in our tech stack lately and the major pipeline that produces products for cell feature explorer doesn't utilize any of them. Bring that pipeline up to date and where possible, add more to it.
In the coming year, more cells can be successfully segmented and new features from these cells and its subcellular structures will be calculated. Rewriting the single cell processing pipeline will enable storage and sharing of these important metrics with the research community - see here for the current dataset.
Solution
Please describe your ideal solution
Use cookiecutter-stepworkflow to rewrite the pipeline and as a part of that start using dask for processing instead of the oddly configured sbatch scripts we currently use.
Alternatives
Please describe any alternatives you've considered, even if you've dismissed them
At a bare minimum use prefect and dask, but I think this would be an ideal candidate for the first fully fledged step_workflow pipeline.
Stakeholders
Please add any individual person or team's that should be brought in for discussion on the project
Greg, Dan, Gabe
Major Components
Please add any major components that need to be done for this project
[ ] Make a data+process visual diagram of the complete single-cell processing pipeline
[ ] Rewrite using stepworkflow
[ ] Handle any new segmentation algorithm changes
[ ] Add FNET training and application task
[ ] Add web ready file processing (thumbnails, etc)
[ ] Update produced Quilt dataset to include all new files
[ ] Potentially: Send CFE data to Firebase for new database instead of JSON
[ ] Potentially: Add Mitosis Classifier application
[ ] Potentially: Export the table of cells by features to as a Google BigQuery Table
Dependencies
Please add any other major or minor project dependencies here
[ ] (optional) Rewrite FNET Library
[ ] (optional) AICS S3 Storage Centralization
Updates to this: see actk which is a datastep workflow for going from raw FOVs to the basis for the single cell dataset with features and single cell images
Use Case
Please provide a use case to help us understand your request in context We have progressed pretty far in our tech stack lately and the major pipeline that produces products for cell feature explorer doesn't utilize any of them. Bring that pipeline up to date and where possible, add more to it. In the coming year, more cells can be successfully segmented and new features from these cells and its subcellular structures will be calculated. Rewriting the single cell processing pipeline will enable storage and sharing of these important metrics with the research community - see here for the current dataset.
Solution
Please describe your ideal solution Use cookiecutter-stepworkflow to rewrite the pipeline and as a part of that start using
dask
for processing instead of the oddly configuredsbatch
scripts we currently use.Alternatives
Please describe any alternatives you've considered, even if you've dismissed them At a bare minimum use
prefect
anddask
, but I think this would be an ideal candidate for the first fully fledgedstep_workflow
pipeline.Stakeholders
Please add any individual person or team's that should be brought in for discussion on the project Greg, Dan, Gabe
Major Components
Please add any major components that need to be done for this project [ ] Make a data+process visual diagram of the complete single-cell processing pipeline [ ] Rewrite using stepworkflow [ ] Handle any new segmentation algorithm changes [ ] Add FNET training and application task [ ] Add web ready file processing (thumbnails, etc) [ ] Update produced Quilt dataset to include all new files [ ] Potentially: Send CFE data to Firebase for new database instead of JSON [ ] Potentially: Add Mitosis Classifier application [ ] Potentially: Export the table of cells by features to as a Google BigQuery Table
Dependencies
Please add any other major or minor project dependencies here [ ] (optional) Rewrite FNET Library [ ] (optional) AICS S3 Storage Centralization
Other Notes
Please add any extra notes here