Open gwaybio opened 4 years ago
@hillsbury - this sounds like a great first issue to me :) want to take a crack at it?
Any update on this? If no one else has time to aggregate the files, I can try to do it
was going to start taking a look through this today! how high priority is this?
Pretty high, I think! We'd like to be able to present results at Friday's Calico meeting, so it would be awesome if aggregated data was available by EOD Wednesday at the latest. But just lmk if that's not possible!
On Tue, Oct 27, 2020 at 10:22 AM hillsbury notifications@github.com wrote:
was going to start taking a look through this today! how high priority is this?
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/broadinstitute/pooled-cell-painting-profiling-recipe/issues/56#issuecomment-717278907, or unsubscribe https://github.com/notifications/unsubscribe-auth/AMTYXL3TOMON5RTCE6YLNUDSM3JSVANCNFSM4S4TZ6JA .
-- Julia Bauman Research Associate III - Neal Lab Cancer Program Broad Institute of MIT and Harvard
I see. I don't want to make any promises right now so feel free to take a stab at it in the meantime as well!
Update on my end: I was able to make a rough comparison to NGS by manually downloading and merging all of the spreadsheets for a single well of a single plate, but this took a lot of time and storage space on my laptop. We'd like to be able to make the comparison for the rest of the data too, so it would still be great to get aggregated barcode call spreadsheets for each plate!
On Tue, Oct 27, 2020 at 11:25 AM hillsbury notifications@github.com wrote:
I see. I don't want to make any promises right now so feel free to take a stab at it in the meantime as well!
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/broadinstitute/pooled-cell-painting-profiling-recipe/issues/56#issuecomment-717321144, or unsubscribe https://github.com/notifications/unsubscribe-auth/AMTYXL7EB6NR3VVCBUV6Q2LSM3Q6BANCNFSM4S4TZ6JA .
-- Julia Bauman Research Associate III - Neal Lab Cancer Program Broad Institute of MIT and Harvard
@jbauman214 - Glad we were able to get some sense of this!
Unless we need these results immediately, to inform a critical upcoming experiment - I don't think we do, but please LMK if I'm wrong - we will perform this analysis in a sustainable way. Concretely, this means integrating the appropriate python code in the correct recipe file, and then updating the CP151 data weld.
@hillsbury and I are going to walk through this (and more!) tomorrow.
@gwaygenomics - re: urgency, this is one of the (several) key troubleshooting experiments/analyses that we'll want to have for the JSC, but not to inform any imminent experiments, if that helps with prioritization.
it does, thanks
Can you speak more to exactly what you're after as well? In my mind, all we need is a single per-plate .csv file with three columns 1) Gene 2) sgRNA 3) cell count. We have figures visualizing counts already, but not a step to generate this summary file.
@jbauman214 should probably chime in here.
Yes, that would be perfect!
On Thu, Nov 5, 2020 at 1:30 PM jt-neal notifications@github.com wrote:
@jbauman214 https://github.com/jbauman214 should probably chime in here.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/broadinstitute/pooled-cell-painting-profiling-recipe/issues/56#issuecomment-722559373, or unsubscribe https://github.com/notifications/unsubscribe-auth/AMTYXL7IGHGPXFFMXXRSYQLSOLVMPANCNFSM4S4TZ6JA .
-- Julia Bauman Research Associate III - Neal Lab Cancer Program Broad Institute of MIT and Harvard
@hillsbury - lets add cell quality to the spec outlined in https://github.com/broadinstitute/pooled-cell-painting-profiling-recipe/issues/56#issuecomment-722423464
So, the single file should include four columns:
Guide | Gene | Cell Count | Cell Quality |
---|---|---|---|
AACGTCG | GENE X | 53 | Perfect |
AACGTCG | GENE X | 21 | Great |
ATCAACG | GENE X | 67 | Perfect |
and so on...
We'll be able to extract what we need from this file.
Right now, only one file indicating perturbation abundances is output per site. We should make retrieving a per-plate perturbation abundance easier, by summarizing perturbation counts in an additional script.
@jbauman214 - unfortunately, your request for this info is not super-readily available. We do calculate this at a per-site level, so it is possible to retrieve. The file name you are looking for is:
EXPERIMENT_LABEL/data/0.site-qc/PLATE_NAME/spots/SITE_NAME/cell_perturbation_category_summary_counts.tsv
These are available on github in a private repository for the EXPERIMENT_LABEL per PLATE_NAME. I am intentionally obscuring experimental details as this issue is in a public repo.