broadinstitute / lincs-profiling-complementarity

Analyzing and comparing signal found in different profiling technologies
BSD 3-Clause "New" or "Revised" License
5 stars 5 forks source link

Level-4 Exploratory analysis results - L1000 vs Cell painting Comparison #5

Open AdeboyeML opened 3 years ago

AdeboyeML commented 3 years ago

L1000 vs Cell Painting Comparison based on median correlation values from compound replicates per dose

@gwaygenomics @shntnu

- Median score scatter plot

image

- Median score distribution across doses

image

- Compounds with reproducible median correlation values (i.e. p_values below 0.05)

image

- Reproducible median scores scatterplot per dose

image

gwaybio commented 3 years ago
gwaybio commented 3 years ago

☝️ @AdeboyeML and I discussed this in real time during our checkin :)

AdeboyeML commented 3 years ago

@gwaygenomics @shntnu

Updated figures

- L1000 vs Cell painting median score scatter plots (includes 95th percentile of the null distribution per dose)

image

- Reproducible median scores scatter plot per dose (includes 95th percentile of null distribution per dose)

image

- Cell painting -- Replicate versus Non-replicate (Null) correlation values distribution across all doses (1-6)

image

- L1000 -- Replicate versus Non-replicate (Null) correlation values distribution across all doses (1-6)

image

AdeboyeML commented 3 years ago

Analysis based on Signature Strength and Transcriptional/Morphological activity scores of L1000 and Cell painting

@gwaygenomics @shntnu

Definitions from clue.io

Signature strength VS Replicate correlation (median scores) - L1000

- (includes 95th percentile null distribution median score)

image

TAS VS Replicate correlation (median scores) - L1000

image

Signature strength VS Replicate correlation (median scores) - Cell painting

image

MAS VS Replicate correlation (median scores) - Cell painting

image

Signature Strength comparison between L1000 landmark genes and Cell painting morphological features

image

L1000 TAS vs Cell painting MAS

image

gwaybio commented 3 years ago

I deleted all the notes from the 1.5 hour meeting @shntnu, @AdeboyeML and I had just now....and there were a lot!

I'll try to remember the key pieces, but please add to this list:

AdeboyeML commented 3 years ago

Updated Results and Figures

@gwaygenomics @shntnu

- Calculated the null distribution per dose from 5 randomly selected plates per dose (as suggested from the last meeting)

- Median score scatter plot

image

image

- Reproducible compound median scores above 4 doses (Corrected figure)

image

image

- Number of compounds with reproducible median scores

image

image

- Cell painting -- Replicate versus Non-replicate (Null) correlation values distribution across all doses (1-6)

image

- L1000 -- Replicate versus Non-replicate (Null) correlation values distribution across all doses (1-6)

image

Signature Strength, Morphological Activity Score (MAS) and Transcriptional Activity Score (TAS) Viz..

- TAS vs MAS

image

image

- Signature Strength Comparison

image

image

- - Compounds with high MAS but low TAS (i.e. TAS below 0.3 and MAS > 0.7)

image

image

- Plots for compounds with reproducible median scores

- TAS vs MAS

image

image

- Signature Strength Comparison

image

image

gwaybio commented 3 years ago

This analysis is painting quite a nice picture @AdeboyeML - great work. Three thoughts:

  1. For this figure: "Compounds with high MAS but low TAS (i.e. TAS below 0.3 and MAS > 0.7)"
    • Can you provide me with the underlying data for this? I'd like to perform one follow-up exploratory analysis.
    • I specifically want the transcriptionally active genes for each compound. Think of a tidy, long data table with the following format:
Compound MOA Transcriptionally active genes MAS TAS
X Y Gene A 0.25 0.7
X Y Gene B 0.25 0.7
X Y Gene C 0.25 0.7

If it's easier, the data can be output in a different tidy format, but this one will work nicely (but it will be large!).

  1. For the figures where we subset to only looking at compounds with 6 reproducible doses, can we bump this down to having 4, 5, or 6 reproducible doses? I don't want to drop a compound if there was a single dose anomaly, or if the compounds were used at too low a dose.
  2. What do you see as the next steps?

This is exciting progress! Looking forward to discussing this further 💯

AdeboyeML commented 3 years ago

Assessing quality of clustering between Cell painting and L1000

- PCA and UMAP

-- Cell painting

- Explained variance by PCs -- ~77% variance explained by 25 Principal Components (PCs)

image

image

-- L1000

- Explained variance by PCs -- ~75% variance explained by 200 Principal Components (PCs)

image

image

- UMAP -- Cell Painting

image

image

- UMAP -- L1000

image

image

- Silhouette Score

- The silhouette score of 1 means that the clusters are very dense and nicely separated. The score of 0 means that clusters are overlapping. The score of less than 0 means that data belonging to clusters may be wrong/incorrect.

image

image

- Davies-Bouldin Index

image

image

- Gausian Mixture Model (GMM) - Evaluating based on BIC (Dose 3 - 6)

-- Cell painting

image

-- L1000

image

AdeboyeML commented 3 years ago

image

AdeboyeML commented 3 years ago

image