WayScience / IDR_stream

Software for feature extraction from IDR image data
BSD 3-Clause "New" or "Revised" License
4 stars 2 forks source link

CellProfiler Feature Extraction #3

Closed roshankern closed 1 year ago

roshankern commented 2 years ago

While the current version of IDR_Stream works to extract DeepProfiler features (from MitoCheck mitosis movies), it would be nice to compare the deep-learning extracted to the more traditional method of feature extraction. For this reason, Way Lab lab would like to enable CellProfiler (CP) feature extraction inside IDR_Stream!

Ideally, it would be very convenient to choose between DP or CP features (or both). Because IDR_Stream is object-oriented, it should be relatively simple to implement an option for users to change their feature extractor.

roshankern commented 2 years ago

Below is the pseudo-code for the current stream setup/run (DP feature extraction):

1) Create IdrStream object (with idr_id, temp_dir, final_data_dir)
2) Initialize Aspera downloader
3) Initialize BasicPy preprocessor
4) Initialize CellPose segmentor
5) Copy DP files to IDR Stream temp folder
6) Run stream (with image_metadata, batch_size, start_batch)

Below is the proposed pseudo-code for the future stream setup/run (DP/CP feature extraction):

1) Create IdrStream object (with stream_type, idr_id, temp_dir, final_data_dir)
2) Initialize Aspera downloader
3) Initialize BasicPy preprocessor
4) Initalize Feature Extractor
If stream_type == DP:
- Initialize CellPose segmentor
- Copy DP files to IDR Stream temp folder
If stream_type == CP:
- Initalize CellProfiler extractor
(Not sure what this involves)
6) Run stream (with image_metadata, batch_size, start_batch)
- Likely will involve different steps depending on stream_type
roshankern commented 2 years ago

It is important to preserve the output format when using different feature extractors. As shown in design below, the current version of IDR_stream processes images in batches and saves the output of each batch's features to a compressed dataframe file (.csv.gz). This format should be preserved with the CP feature output.

Stream_Design

Example contents of each batch file are shown below. Each line of a batch dataframe is data for a single cell. This data includes metadata (coordinates, plate, well, frame, pertubation, etc) and feature data (in the image below effiecientnet_0, effiecientnet_1, etc). Single-cell metadata should be consistent between DP and CP feature extractor runs. However, the feature data should depend on what type of stream is being used to extract feature data.

image

roshankern commented 1 year ago

CellProfiler feature extraction has been implemented in https://github.com/WayScience/IDR_stream/commit/ee505ad033f1a59502692fb137612279a2a9fa34!