broadinstitute / 2022_PERISCOPE

This repository contains all supporting analyses and files for Ramezani, Bauman, Singh, and Weisbart, et al. "A genome-wide atlas of human cell morphology".
BSD 3-Clause "New" or "Revised" License
2 stars 0 forks source link

Cleanup of Fig 2 #5

Closed ErinWeisbart closed 1 year ago

ErinWeisbart commented 1 year ago

Outstanding questions for @MerajRamezani:

Changes to note: Using the updated CCLE data has a small effect on our hit calling as the control groups are slightly changed. (The lengths of the zero_tpm_list and expressed_gene_list have changed from 3162/17230 to 3067/17325). This has a ripple effect causing minor changes to many numbers/figures in the notebook. Visible changes include: Fig 2A/B: The number of compartment-specific and whole cell hits has changed from 1336/2309 to 1340/2321. This also causes a minor shift in the Number of Hit Genes Called by Compartment pie chart. Fig 2C: The gene pair correlation dictionary has more entries from 6641190 to 6699630. Fig 2E: Shift in compartment ratios

MerajRamezani commented 1 year ago

@ErinWeisbart The Corum data set is from here : http://mips.helmholtz-muenchen.de/corum/#download

I have been using the Corum 3.0( corum_2018_09_03) the 2018 release. They have actually released a new version 4.1, should I update the data?

ErinWeisbart commented 1 year ago

Thanks Meraj. I'll update it to 4.1 in the cleanup I'm doing now. If we do decide to incorporate the PCA into the CORUM analysis as you've described in https://github.com/broadinstitute/pooled-cell-painting-data-analysis/issues/112#issuecomment-1353639813, please wait to update the code until I've merged my cleanup into main.

ErinWeisbart commented 1 year ago

@MerajRamezani I have a working first draft of this notebook. Please address the outstanding questions I have listed above as I would like to address them before merging this in. (You are welcome to provide additional feedback at this point.)

ErinWeisbart commented 1 year ago

From conversation with Meraj and some new code he sent me:

I'm going to go ahead and merge this in as the majority of the cleanup is done and will move open questions/known cleanup needs to #12, noting that it is currently an outstanding question which version of the CCLE data we will use in the final version.