Closed roshankern closed 1 year ago
Below is the pseudo-code for the current stream setup/run (DP feature extraction):
1) Create IdrStream object (with idr_id, temp_dir, final_data_dir)
2) Initialize Aspera downloader
3) Initialize BasicPy preprocessor
4) Initialize CellPose segmentor
5) Copy DP files to IDR Stream temp folder
6) Run stream (with image_metadata, batch_size, start_batch)
Below is the proposed pseudo-code for the future stream setup/run (DP/CP feature extraction):
1) Create IdrStream object (with stream_type, idr_id, temp_dir, final_data_dir)
2) Initialize Aspera downloader
3) Initialize BasicPy preprocessor
4) Initalize Feature Extractor
If stream_type == DP:
- Initialize CellPose segmentor
- Copy DP files to IDR Stream temp folder
If stream_type == CP:
- Initalize CellProfiler extractor
(Not sure what this involves)
6) Run stream (with image_metadata, batch_size, start_batch)
- Likely will involve different steps depending on stream_type
It is important to preserve the output format when using different feature extractors. As shown in design below, the current version of IDR_stream processes images in batches and saves the output of each batch's features to a compressed dataframe file (.csv.gz
). This format should be preserved with the CP feature output.
Example contents of each batch file are shown below. Each line of a batch dataframe is data for a single cell. This data includes metadata (coordinates, plate, well, frame, pertubation, etc) and feature data (in the image below effiecientnet_0, effiecientnet_1, etc). Single-cell metadata should be consistent between DP and CP feature extractor runs. However, the feature data should depend on what type of stream is being used to extract feature data.
CellProfiler feature extraction has been implemented in https://github.com/WayScience/IDR_stream/commit/ee505ad033f1a59502692fb137612279a2a9fa34!
While the current version of
IDR_Stream
works to extract DeepProfiler features (from MitoCheck mitosis movies), it would be nice to compare the deep-learning extracted to the more traditional method of feature extraction. For this reason, Way Lab lab would like to enable CellProfiler (CP) feature extraction insideIDR_Stream
!Ideally, it would be very convenient to choose between DP or CP features (or both). Because
IDR_Stream
is object-oriented, it should be relatively simple to implement an option for users to change their feature extractor.