broadinstitute / pooled-cell-painting-profiling-recipe

:woman_cook: Recipe repository for image-based profiling of Pooled Cell Painting experiments
BSD 3-Clause "New" or "Revised" License
6 stars 4 forks source link

Hardcoding Cells and Nuclei for threshold QC #76

Open gwaybio opened 3 years ago

gwaybio commented 3 years ago

In #75, @ErinWeisbart writes about our options to handle the hardcoding issue I commented on in https://github.com/broadinstitute/pooled-cell-painting-profiling-recipe/pull/75#discussion_r647760584:

In a Cell Painting workflow, Cells and Nuclei are the two compartments that are always segmented. Segmenting fewer compartments is impossible because we need to identify individual cells and we must use Nuclei to determine Cells. It's possible to segment more compartments, but we don't currently do that (even in our workflow where we have many more labels) and are unlikely to do so because 1) it seems to be unnecessary and 2) it's prone to mistakes/variability and therefore requires significant hands on time.

Therefore, the thresholds we want to plot here are likely to always be Cells and Nuclei. The question is therefore should we remove hardcoding in case Cells and Nuclei compartments are ever be labeled in a different way (e.g. cells, Cell, etc.). To do that, it looks like the options are:

Create a new entry in options.yaml to specify the segmented compartment names. Not ideal because it's one more thing to have to enter, but it's the least prone to breaking. Use core: compartments: and remove Cytoplasm from the list because it's the only tertiary compartment (mathematically determined by subtracting one compartment from another) and any other compartment in the list will have been created by segmentation. Not ideal because then we have to find a way to account for labelling of Cytoplasm. Use core: cell_matchcols: cytoplasm: and then strip the Parent from the front of the compartment string. Parent is hardcoded into CellProfiler, so it's fine from our workflow perspective, but would mean this wouldn't work with data coming not from CellProfiler (which I thought was a goal?).


We may decide to tackle this at a later date