broadinstitute / pooled-cell-painting-profiling-recipe

:woman_cook: Recipe repository for image-based profiling of Pooled Cell Painting experiments
BSD 3-Clause "New" or "Revised" License
6 stars 4 forks source link

Include summary step for guide + guide abundances (cell count per perturbation) #56

Open gwaybio opened 3 years ago

gwaybio commented 3 years ago

Right now, only one file indicating perturbation abundances is output per site. We should make retrieving a per-plate perturbation abundance easier, by summarizing perturbation counts in an additional script.

@jbauman214 - unfortunately, your request for this info is not super-readily available. We do calculate this at a per-site level, so it is possible to retrieve. The file name you are looking for is:

EXPERIMENT_LABEL/data/0.site-qc/PLATE_NAME/spots/SITE_NAME/cell_perturbation_category_summary_counts.tsv

These are available on github in a private repository for the EXPERIMENT_LABEL per PLATE_NAME. I am intentionally obscuring experimental details as this issue is in a public repo.

gwaybio commented 3 years ago

@hillsbury - this sounds like a great first issue to me :) want to take a crack at it?

jbauman214 commented 3 years ago

Any update on this? If no one else has time to aggregate the files, I can try to do it

hillsbury commented 3 years ago

was going to start taking a look through this today! how high priority is this?

jbauman214 commented 3 years ago

Pretty high, I think! We'd like to be able to present results at Friday's Calico meeting, so it would be awesome if aggregated data was available by EOD Wednesday at the latest. But just lmk if that's not possible!

On Tue, Oct 27, 2020 at 10:22 AM hillsbury notifications@github.com wrote:

was going to start taking a look through this today! how high priority is this?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/broadinstitute/pooled-cell-painting-profiling-recipe/issues/56#issuecomment-717278907, or unsubscribe https://github.com/notifications/unsubscribe-auth/AMTYXL3TOMON5RTCE6YLNUDSM3JSVANCNFSM4S4TZ6JA .

-- Julia Bauman Research Associate III - Neal Lab Cancer Program Broad Institute of MIT and Harvard

hillsbury commented 3 years ago

I see. I don't want to make any promises right now so feel free to take a stab at it in the meantime as well!

jbauman214 commented 3 years ago

Update on my end: I was able to make a rough comparison to NGS by manually downloading and merging all of the spreadsheets for a single well of a single plate, but this took a lot of time and storage space on my laptop. We'd like to be able to make the comparison for the rest of the data too, so it would still be great to get aggregated barcode call spreadsheets for each plate!

On Tue, Oct 27, 2020 at 11:25 AM hillsbury notifications@github.com wrote:

I see. I don't want to make any promises right now so feel free to take a stab at it in the meantime as well!

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/broadinstitute/pooled-cell-painting-profiling-recipe/issues/56#issuecomment-717321144, or unsubscribe https://github.com/notifications/unsubscribe-auth/AMTYXL7EB6NR3VVCBUV6Q2LSM3Q6BANCNFSM4S4TZ6JA .

-- Julia Bauman Research Associate III - Neal Lab Cancer Program Broad Institute of MIT and Harvard

gwaybio commented 3 years ago

@jbauman214 - Glad we were able to get some sense of this!

Unless we need these results immediately, to inform a critical upcoming experiment - I don't think we do, but please LMK if I'm wrong - we will perform this analysis in a sustainable way. Concretely, this means integrating the appropriate python code in the correct recipe file, and then updating the CP151 data weld.

@hillsbury and I are going to walk through this (and more!) tomorrow.

jt-neal commented 3 years ago

@gwaygenomics - re: urgency, this is one of the (several) key troubleshooting experiments/analyses that we'll want to have for the JSC, but not to inform any imminent experiments, if that helps with prioritization.

gwaybio commented 3 years ago

it does, thanks

Can you speak more to exactly what you're after as well? In my mind, all we need is a single per-plate .csv file with three columns 1) Gene 2) sgRNA 3) cell count. We have figures visualizing counts already, but not a step to generate this summary file.

jt-neal commented 3 years ago

@jbauman214 should probably chime in here.

jbauman214 commented 3 years ago

Yes, that would be perfect!

On Thu, Nov 5, 2020 at 1:30 PM jt-neal notifications@github.com wrote:

@jbauman214 https://github.com/jbauman214 should probably chime in here.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/broadinstitute/pooled-cell-painting-profiling-recipe/issues/56#issuecomment-722559373, or unsubscribe https://github.com/notifications/unsubscribe-auth/AMTYXL7IGHGPXFFMXXRSYQLSOLVMPANCNFSM4S4TZ6JA .

-- Julia Bauman Research Associate III - Neal Lab Cancer Program Broad Institute of MIT and Harvard

gwaybio commented 3 years ago

@hillsbury - lets add cell quality to the spec outlined in https://github.com/broadinstitute/pooled-cell-painting-profiling-recipe/issues/56#issuecomment-722423464

So, the single file should include four columns:

Guide Gene Cell Count Cell Quality
AACGTCG GENE X 53 Perfect
AACGTCG GENE X 21 Great
ATCAACG GENE X 67 Perfect

and so on...

We'll be able to extract what we need from this file.