broadinstitute / pooled-cell-painting-profiling-recipe

:woman_cook: Recipe repository for image-based profiling of Pooled Cell Painting experiments
BSD 3-Clause "New" or "Revised" License
6 stars 4 forks source link

Revert `site_full` back to `site` #45

Closed gwaybio closed 3 years ago

gwaybio commented 3 years ago

In #23 (specifically https://github.com/broadinstitute/pooled-cell-painting-profiling-recipe/pull/23#discussion_r456045044) I suggested a column name change within the 0.preprocess recipe module. This was a bad idea!

It was a bad idea b/c it breaks 0.merge-single-cells.py at: https://github.com/broadinstitute/pooled-cell-painting-profiling-recipe/blob/1be45bb33e1da71050dbb102352690cf8a08fb7c/1.generate-profiles/0.merge-single-cells.py#L143L146

we need to revert this change (and address some corresponding breaks after updating) before we can produce profiles.

ErinWeisbart commented 3 years ago

Will we stick with nomenclature that site is the full string and Site is parsed? e.g. site = 151_B1_12 and Site = 12 so site parses to Plate Well Site

gwaybio commented 3 years ago

We need to determine site nomenclature here and settle on what terms to use. Luckily, I think this is pretty straightforward, but it does require more thought that I had originally used.

The issue specifically is that, in #23, we updated a column name from site to site_full. After merging the CellProfiler compartments (using merge_single_cell_compartments()) we hardcode pd.assign() a metadata column called Metadata_Foci_Site here. After the update to site_full in #23, this line specifically makes it incompatible to merge the cell_ids (with metadata) to the CellProfiler features, given our current requirement for the columns to have a 1to1 mapping.


A side note that is beyond scope of this issue, is that I am noticing quite a lot of bleed through between non-core config entries in the two config files. I think streamlining these config files (maybe worth revisiting some of the decisions in #5). For example, I think we can solve the site_full issue with better config options. But this solution is beyond the scope of version 0.1 as well.

gwaybio commented 3 years ago

Will we stick with nomenclature that site is the full string and Site is parsed?

I think we should definitely stick to site being the full label (e.g. "151_B1_12"). This is consistent with our folder structure, and I think it is the path of least resistance to solving this bug. Also, I agree that site can parse to Plate, and Well, but I am not so sure about Site. The capitalization being the only difference is pretty fragile. For example, I can imagine wanting to standardize capitalization of all columns at some point, and building this in now would break that operation.

Currently, what do we use the parsed Site for now? Is it just QC visualization? If so, I propose that we keep the site to reference the full label, and then site_location to reference the physical location in the well. I think this will do for version 0.1, but we should address the issue the right way by using a config file instead of hardcoding for version 0.2. (the config files will need to be different than as they are now)

gwaybio commented 3 years ago

In fb8eb0a73af2f1756bf75382a2fb4354652282c0 I made an executive decision to do as proposed in https://github.com/broadinstitute/pooled-cell-painting-profiling-recipe/issues/45#issuecomment-670206381

in c8bbaaaba099a16f5a8e8cd9307cfda35ac116d2 i attempt to fix this in the qc step. This is still broken - will fix asap

gwaybio commented 3 years ago

in d98ded387fe376ea310ae23847ea40758de2c388 is a somewhat fragile fix... It works with the current setup, but we really need to create a much improved config system in version 0.2. I believe that an enhanced config system will deal with these inconsistencies automatically and "for free".

This fragile fix should be revisited.