Closed gwaybio closed 4 years ago
Will we stick with nomenclature that site
is the full string and Site
is parsed?
e.g. site = 151_B1_12
and Site = 12
so site
parses to Plate Well Site
We need to determine site
nomenclature here and settle on what terms to use. Luckily, I think this is pretty straightforward, but it does require more thought that I had originally used.
The issue specifically is that, in #23, we updated a column name from site
to site_full
. After merging the CellProfiler compartments (using merge_single_cell_compartments()
) we hardcode pd.assign()
a metadata column called Metadata_Foci_Site
here. After the update to site_full
in #23, this line specifically makes it incompatible to merge the cell_ids (with metadata) to the CellProfiler features, given our current requirement for the columns to have a 1to1 mapping.
A side note that is beyond scope of this issue, is that I am noticing quite a lot of bleed through between non-core config entries in the two config files. I think streamlining these config files (maybe worth revisiting some of the decisions in #5). For example, I think we can solve the site_full
issue with better config options. But this solution is beyond the scope of version 0.1 as well.
Will we stick with nomenclature that site is the full string and Site is parsed?
I think we should definitely stick to site
being the full label (e.g. "151_B1_12"). This is consistent with our folder structure, and I think it is the path of least resistance to solving this bug. Also, I agree that site
can parse to Plate
, and Well
, but I am not so sure about Site
. The capitalization being the only difference is pretty fragile. For example, I can imagine wanting to standardize capitalization of all columns at some point, and building this in now would break that operation.
Currently, what do we use the parsed Site
for now? Is it just QC visualization? If so, I propose that we keep the site
to reference the full label, and then site_location
to reference the physical location in the well. I think this will do for version 0.1, but we should address the issue the right way by using a config file instead of hardcoding for version 0.2. (the config files will need to be different than as they are now)
In fb8eb0a73af2f1756bf75382a2fb4354652282c0 I made an executive decision to do as proposed in https://github.com/broadinstitute/pooled-cell-painting-profiling-recipe/issues/45#issuecomment-670206381
in c8bbaaaba099a16f5a8e8cd9307cfda35ac116d2 i attempt to fix this in the qc step. This is still broken - will fix asap
in d98ded387fe376ea310ae23847ea40758de2c388 is a somewhat fragile fix... It works with the current setup, but we really need to create a much improved config system in version 0.2. I believe that an enhanced config system will deal with these inconsistencies automatically and "for free".
This fragile fix should be revisited.
In #23 (specifically https://github.com/broadinstitute/pooled-cell-painting-profiling-recipe/pull/23#discussion_r456045044) I suggested a column name change within the
0.preprocess
recipe module. This was a bad idea!It was a bad idea b/c it breaks
0.merge-single-cells.py
at: https://github.com/broadinstitute/pooled-cell-painting-profiling-recipe/blob/1be45bb33e1da71050dbb102352690cf8a08fb7c/1.generate-profiles/0.merge-single-cells.py#L143L146we need to revert this change (and address some corresponding breaks after updating) before we can produce profiles.