broadinstitute / lincs-profiling-complementarity

Analyzing and comparing signal found in different profiling technologies
BSD 3-Clause "New" or "Revised" License
5 stars 5 forks source link

Report of MOAs sizes/values per dose analysis (ranking) on consensus datasets #2

Open AdeboyeML opened 3 years ago

AdeboyeML commented 3 years ago

@gwaygenomics @shntnu

The goal here was to determine the size/value of each MOA (Mechanism of action) for each dose based on taking the median of the correlation values between compounds of the same MOA in the consensus datasets.

MOAs with only one compound were excluded. Out of 601 MOAs, 369 were excluded.

Results: I will be showing only the first 10 MOAs values per dose for all the consensus datasets and the heatmap plots of the 232 MOAs for each of the consensus datasets. Doses 0 (has only dmso) and 7 (has only two MOAs) were excluded from the analysis.

1a. Median Aggregation Consensus dataset - consensus_median (whole plate normalization).

cons_med

Heatmap plots -- I split the dataframe into 3 parts for easier visualization:

median_heatmap_1

median_heatmap_2

median_heatmap_3

1b. Median Aggregation Consensus dataset - consensus_median_dmso (dmso normalization).

cons_med_dmso

heatmap plots

median_dmso_heatmap_1 median_dmso_heatmap_2 median_dmso_heatmap_3

2a. Modified Z Score Aggregation (MODZ) dataset - consensus_modz (whole plate normalization).

image

heatmap plots

modz_heatmap_1 modz_heatmap_2 modz_heatmap_3

2b. Modified Z Score Aggregation (MODZ) dataset - consensus_modz_dmso (dmso normalization).

image

heatmap plots

modz_dmso_heatmap_1 modz_dmso_heatmap_2 modz_dmso_heatmap_3

AdeboyeML commented 3 years ago

@gwaygenomics

- MOAS that do not have the same number of compounds in all Doses

image image

shntnu commented 3 years ago

@AdeboyeML and I inspected one of the cases and found that the dose remapping can be flawed. E.g. PYM50028 (BRD-K62277907-001-01-6) has two doses coded as level 1

image

shntnu commented 3 years ago

@AdeboyeML uses corr_val = abs(df_dose_corr.loc[cpds[y], cpds[x]]) but we shouldn't take the absolute value. We may have compounds in there that are negatively correlated and that should count against the MOA.

Also consider computing the correlation matriix of the subsetted dataframe corresponding to the replicates you care about and then take the median of the lower (or upper triangular matrix). This is an implementation detail, but the logic is otherwise correct.

AdeboyeML commented 3 years ago

just an update to https://github.com/broadinstitute/lincs-profiling-comparison/issues/2#issuecomment-717303986

- MOAS that do not have the same number of compounds in all Doses

image image

AdeboyeML commented 3 years ago

@gwaygenomics @shntnu

- Results from Null Distribution

Major points:

- Visualization: non-parametric p-value vs median pairwise correlation score (for each MOA) per dose

- Median Consensus

image

- Median DMSO Consensus

image

- MODZ Consensus

image

- MODZ DMSO Consensus

image

- Median Consensus -MOAs with p-values <0.05 in all doses

image

- Median DMSO Consensus -MOAs with p-values <0.05 in all doses

image

- MODZ Consensus -MOAs with p-values <0.05 in all doses

image

- MODZ DMSO Consensus -MOAs with p-values <0.05 in all doses

image

shntnu commented 3 years ago

Looking forward to digging into this!!

On Tue, Nov 3, 2020 at 9:02 PM Adeniyi Adeboye notifications@github.com wrote:

  • Results from Null Distribution

    -

    Null distribution - is generated by getting the median correlation score of randomly combined compounds that do not share/come from the same MOAs.

    In our case, we generated 1000 median correlation scores from randomly combined compounds as the null distribution for each MOA.

    A P value can be computed nonparametrically by evaluating the probability of random compounds of different MOAs having greater median similarity value than compounds of the same MOAs.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/broadinstitute/lincs-profiling-comparison/issues/2#issuecomment-721469990, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAJHQPC57X4UICO5MFUKSITSOCY33ANCNFSM4S52XGNQ .

-- -Shantanu

AdeboyeML commented 3 years ago

- In addition to the figures shown in https://github.com/broadinstitute/lincs-profiling-comparison/issues/2#issuecomment-721469990, I included the density distribution of the p-values vs median scores in the same plots:

- - Median Consensus

image

- - Median DMSO Consensus

image

- MODZ Consensus

image

- MODZ DMSO Consensus

image

shntnu commented 3 years ago

@AdeboyeML quick question:

In this figure (where each data points is an MOA, I assume), if two MOA's have the same number of compounds, and the same x-axis value (median pairwise correlation between compounds), do they also have the same y-axis value?

Or phrased more simply, are you computing the null distribution once for each MOA size (by size I mean number of compounds in the MOA classs)? Or are you doing it once per MOA? Both are fine, but the former is preferred to remove y-axis variance that's not informative.

image

shntnu commented 3 years ago

@AdeboyeML kudos for making the data so easy to peek into. I was curious to see if one could see a dose-response in some of the MOAs. Looks like we do in a few.

More during profiling check-in!

moa_score_dose_response

Code ```r moa_consistency <- read_csv( "https://raw.githubusercontent.com/broadinstitute/lincs-profiling-comparison/9bc5db8167674e2c8bec5cee3fcc043117acfbf6/1.Data-exploration/moa_sizes_consensus_datasets/median_dmso_moa_median_scores.csv" ) moa_consistency %<>% rename(moa = X1) moa_consistency %<>% pivot_longer(-moa, names_to = "dose", values_to = "score") moa_consistency %<>% mutate(dose = as.integer(str_remove(dose, "dose_"))) moa_consistency %<>% inner_join( moa_consistency %>% group_by(moa) %>% summarize(score_median = median(score)) %>% filter(score_median > 0.30) ) p <- ggplot(moa_consistency, aes(dose, score)) + geom_line() + facet_wrap( ~ round(score_median, 2) ~ moa, ncol = 5, scales = "free_y") ggsave("~/Desktop/moa_score_dose_response.png", width = 10, height = 10) ```
AdeboyeML commented 3 years ago

@shntnu In regards to question in https://github.com/broadinstitute/lincs-profiling-comparison/issues/2#issuecomment-724933716

Or phrased more simply, are you computing the null distribution once for each MOA size (by size I mean number of compounds in the MOA classs)? Or are you doing it once per MOA? Both are fine, but the former is preferred to remove y-axis variance that's not informative.

if two MOA's have the same number of compounds, and the same x-axis value (median pairwise correlation between compounds), do they also have the same y-axis value?

shntnu commented 3 years ago

@AdeboyeML thanks for clarifying. Everything looks good, but the one change I recommend is to use the same null distribution for all MOAs of the same size.

There is no upside to having different null distributions for each unique MOA (of the same size), while it has the downside of adding uninformative variance to the p-value estimates.

cc @gwaygenomics

AdeboyeML commented 3 years ago

@shntnu @gwaygenomics

Results from the Null distribution, based on using the same null distribution for all MOAs of the same size.

Median Consensus

image

Distribution of the median pairwise correlation scores

image

image

P-values distribution across doses

Increase in MOAs with values below the significant level (0.05) as dose increases

image

MOAs with p-values <0.05 in all doses

image

These MOAs dose responses:

image

The above results and distributions are similar for the Modz Consensus datasets

shntnu commented 3 years ago
  • There seems to be no relationship between the median pairwise correlation and the obtained p-value generated from null distribution.

That's really strange – sounds like a bug to me

AdeboyeML commented 3 years ago

@shntnu @gwaygenomics

Comparing the distribution of median scores between L1000 and Lincs Cell painting Consensus datasets

- Major points

Results -- MODZ Consensus dataset

Scatter plot btw L1000 vs Lincs Cell Painting median scores per dose

image

Distribution of median scores in L1000 and Cell Painting Data per dose

image

Distribution on a dose-by-dose basis

image

image

- I am still trying to figure out the reason behind the relationship between the p-value and median scores (null distribution) in https://github.com/broadinstitute/lincs-profiling-comparison/issues/2#issuecomment-726123235

AdeboyeML commented 3 years ago

@shntnu @gwaygenomics

Results from the Null distribution, based on using the same null distribution for all MOAs of the same size. (L1000 & Cell Painting)

Cell painting

MODZ Consensus

image

L 1000

MODZ Level-5 data

image

- The above results and distributions are similar for both median and rank level-5 (Consensus data) in cell painting and L1000.