Closed gwaybio closed 4 years ago
The way that we have our CellProfiler analysis pipeline currently set, the answer is yes, the output is always Plate_Well_Site (so site_full will always fit that split). It is possible one could change that part of the CellProfiler pipeline (there would be no reason for Beth or I to do so at this point).
In the Image.csv there are Metadata_Plate, Metadata_Well, and Metadata_Site columns so we can pull from those columns instead. It is not true that we don't use this split elsewhere - it is in 7.visualize-cell-summary.py as well.
We could move the Image Dataframe creation to 7.visualize-cell-summary, save it there and import that file into 8.image-and-segmentation-qc instead of recreating it? Is this something you'd like me to do?
We could move the Image Dataframe creation to 7.visualize-cell-summary, save it there and import that file into 8.image-and-segmentation-qc instead of recreating it? Is this something you'd like me to do?
This is great to know - given that I have built up some momentum, I will take care of this (I also have some example data that you sent). I think the best use of your time in this project now is reviewing the PRs and checking my logic. We are close to 0.1!
Having trouble finding which column to use in image.csv
- maybe I am looking at the wrong file?
The only metadata columns I see in this file are:
Metadata_FileLocation | Metadata_Frame | Metadata_Series | Metadata_Site | Metadata_TopFolder |
---|---|---|---|---|
0 | 0 | Site_1 | CP074A_A1 |
Is this all the info I need? It still seems a bit dangerous to split off Metadata_TopFolder
. I will keep digging and post what I find here.
We processed CP074A differently. So yes, it used TopFolder but CP151 and all future batches use Plate, Well, and Site. I suggest you switch to using data from a plate of CP151 for building/testing v.1 as it is slightly different in many ways from the batches of CP074 and before.
cool - so the columns in image.csv
will be Metadata_Plate
, Metadata_Well
, and Metadata_Site
?
This code block is giving me some trouble:
https://github.com/broadinstitute/pooled-cell-painting-profiling-recipe/blob/068c7eae79f56a50732fdf173902dc596c95ce3f/0.preprocess-sites/8.image-and-segmentation-qc.py#L202-L204
A couple notes:
Plate
, thenWell
, thenSite
?-
(in a separate experiment, we delimit by underscore)So, this is a very fragile way of handling this split. A couple of solutions:
plate
,well
, andsite number
information earlier and do not rely on this split at all. Is this info in theImage.csv
file in a seprate metadata column already?@ErinWeisbart - we should either resolve this or drop this before version 0.1