broadinstitute / pooled-cell-painting-profiling-recipe

:woman_cook: Recipe repository for image-based profiling of Pooled Cell Painting experiments
BSD 3-Clause "New" or "Revised" License
6 stars 4 forks source link

Splitting full site annotation in 4.image-and-segmentation-qc #35

Closed gwaybio closed 4 years ago

gwaybio commented 4 years ago

This code block is giving me some trouble:

https://github.com/broadinstitute/pooled-cell-painting-profiling-recipe/blob/068c7eae79f56a50732fdf173902dc596c95ce3f/0.preprocess-sites/8.image-and-segmentation-qc.py#L202-L204

A couple notes:

So, this is a very fragile way of handling this split. A couple of solutions:

@ErinWeisbart - we should either resolve this or drop this before version 0.1

ErinWeisbart commented 4 years ago

The way that we have our CellProfiler analysis pipeline currently set, the answer is yes, the output is always Plate_Well_Site (so site_full will always fit that split). It is possible one could change that part of the CellProfiler pipeline (there would be no reason for Beth or I to do so at this point).

In the Image.csv there are Metadata_Plate, Metadata_Well, and Metadata_Site columns so we can pull from those columns instead. It is not true that we don't use this split elsewhere - it is in 7.visualize-cell-summary.py as well.

We could move the Image Dataframe creation to 7.visualize-cell-summary, save it there and import that file into 8.image-and-segmentation-qc instead of recreating it? Is this something you'd like me to do?

gwaybio commented 4 years ago

We could move the Image Dataframe creation to 7.visualize-cell-summary, save it there and import that file into 8.image-and-segmentation-qc instead of recreating it? Is this something you'd like me to do?

This is great to know - given that I have built up some momentum, I will take care of this (I also have some example data that you sent). I think the best use of your time in this project now is reviewing the PRs and checking my logic. We are close to 0.1!

gwaybio commented 4 years ago

Having trouble finding which column to use in image.csv - maybe I am looking at the wrong file?

The only metadata columns I see in this file are:

Metadata_FileLocation Metadata_Frame Metadata_Series Metadata_Site Metadata_TopFolder
  0 0 Site_1 CP074A_A1

Is this all the info I need? It still seems a bit dangerous to split off Metadata_TopFolder. I will keep digging and post what I find here.

ErinWeisbart commented 4 years ago

We processed CP074A differently. So yes, it used TopFolder but CP151 and all future batches use Plate, Well, and Site. I suggest you switch to using data from a plate of CP151 for building/testing v.1 as it is slightly different in many ways from the batches of CP074 and before.

gwaybio commented 4 years ago

cool - so the columns in image.csv will be Metadata_Plate, Metadata_Well, and Metadata_Site?