Closed gwaybio closed 4 years ago
There are two categories of images - cell painting images (that make the cp_sat_df) and barcoding images (that make the bc_sat_df). Each has a different threshold for triggering a saturated warning - CP if an image is > 1% saturated, BC if an image > .2% saturated (which is determined by the ImageQuality_PercentMaximal_
column in the Image.csv).
My goal was to output a list of sites that fail as saturated_sites.csv but to NOT save an empty .csv so that a user will know that if a .csv is created it means there is a problem.
I had also intended for no site to be listed twice (i.e. if it fails in saturation in both CP and BC then it should only be listed once), but now that I'm thinking about it, an even more helpful solution would be to have any site only ever listed once, but for there to be additional columns added of "Fails_Sat_BC" and "Fails_Sat_CP" and have those populated with True/False accordingly.
ok cool - I think I got it.
Happy to give it a shot.
we may need to slightly tinker the code to make sure we extract all saturated sites
I'm not sure I understand what you were initially concerned about? Does my suggested enhancement of
to have any site only ever listed once, but for there to be additional columns added of "Fails_Sat_BC" and "Fails_Sat_CP" and have those populated with True/False accordingly.
fix your initial concern?
yeah, I think reworking the full code block to add this enhancement will address my "tinker comment".
More specifically, the two lines of code of this form: cp_sat_df = image_df[image_df[col] > 1]
inside the for loop will be overwritten each time through the loop. Therefore, unless I am missing something, sat_df = cp_sat_df.append(bc_sat_df).drop_duplicates(subset="site")
will not contain the intended outcome.
I am going through 4.image-and-segmentation.py in much more detail in #39 - the following code block is giving me some trouble:
https://github.com/broadinstitute/pooled-cell-painting-profiling-recipe/blob/b8c9288c0ab8b04e4ce284be5ba9aab44a6ce678/0.preprocess-sites/4.image-and-segmentation-qc.py#L345-L357
I am trying to think through what this is doing (so please bear with me!)
There are two different
ImageQuality_PercentMaximal
types (spots and cells) and this column, inimage.csv
indicates whether or not that image (either spots or cells) is over saturated? We also want these sites to be output to a file namedsaturated_sites.csv
but only if any saturated sites exist. Is this correct?If so, then I think we may need to slightly tinker the code to make sure we extract all saturated sites. Sorry I didn't flag this sooner!