Report of MOAs sizes/values per dose analysis (ranking) on consensus datasets

AdeboyeML commented 3 years ago

@gwaygenomics @shntnu

The goal here was to determine the size/value of each MOA (Mechanism of action) for each dose based on taking the median of the correlation values between compounds of the same MOA in the consensus datasets.

MOAs with only one compound were excluded. Out of 601 MOAs, 369 were excluded.

Results: I will be showing only the first 10 MOAs values per dose for all the consensus datasets and the heatmap plots of the 232 MOAs for each of the consensus datasets. Doses 0 (has only dmso) and 7 (has only two MOAs) were excluded from the analysis.

1a. Median Aggregation Consensus dataset - consensus_median (whole plate normalization).

cons_med

Heatmap plots -- I split the dataframe into 3 parts for easier visualization:

median_heatmap_1

median_heatmap_2

median_heatmap_3

1b. Median Aggregation Consensus dataset - consensus_median_dmso (dmso normalization).

cons_med_dmso

heatmap plots

median_dmso_heatmap_1 median_dmso_heatmap_2 median_dmso_heatmap_3

2a. Modified Z Score Aggregation (MODZ) dataset - consensus_modz (whole plate normalization).

heatmap plots

modz_heatmap_1 modz_heatmap_2 modz_heatmap_3

2b. Modified Z Score Aggregation (MODZ) dataset - consensus_modz_dmso (dmso normalization).

heatmap plots

modz_dmso_heatmap_1 modz_dmso_heatmap_2 modz_dmso_heatmap_3

AdeboyeML commented 3 years ago

@gwaygenomics

- MOAS that do not have the same number of compounds in all Doses

shntnu commented 3 years ago

@AdeboyeML and I inspected one of the cases and found that the dose remapping can be flawed. E.g. PYM50028 (BRD-K62277907-001-01-6) has two doses coded as level 1

shntnu commented 3 years ago

@AdeboyeML uses corr_val = abs(df_dose_corr.loc[cpds[y], cpds[x]]) but we shouldn't take the absolute value. We may have compounds in there that are negatively correlated and that should count against the MOA.

Also consider computing the correlation matriix of the subsetted dataframe corresponding to the replicates you care about and then take the median of the lower (or upper triangular matrix). This is an implementation detail, but the logic is otherwise correct.

AdeboyeML commented 3 years ago

just an update to https://github.com/broadinstitute/lincs-profiling-comparison/issues/2#issuecomment-717303986

Dose 7 was not supposed to be included in the https://github.com/broadinstitute/lincs-profiling-comparison/issues/2#issuecomment-717303986 figure since it has only two MOAs, that was a bug in my code.

- MOAS that do not have the same number of compounds in all Doses

AdeboyeML commented 3 years ago

@gwaygenomics @shntnu

- Results from Null Distribution

Major points:

Null distribution - is generated by getting the median correlation score of randomly combined compounds that do not share/come from the same MOAs.
In our case, we generated 1000 median correlation scores from randomly combined compounds as the null distribution for each MOA.
A P value was computed nonparametrically by evaluating the probability of random compounds of different MOAs having greater median similarity value than compounds of the same MOAs.

- Visualization: non-parametric p-value vs median pairwise correlation score (for each MOA) per dose

- Median Consensus

- Median DMSO Consensus

- MODZ Consensus

- MODZ DMSO Consensus

We also checked MOAs with less than (0.05) 5% probability of obtaining median correlation score from the null distribution that is greater than or equal to the MOAs median correlation score in all doses (1-6).

0.05 was said to be the significant level

- Median Consensus -MOAs with p-values <0.05 in all doses

- Median DMSO Consensus -MOAs with p-values <0.05 in all doses

- MODZ Consensus -MOAs with p-values <0.05 in all doses

- MODZ DMSO Consensus -MOAs with p-values <0.05 in all doses

shntnu commented 3 years ago

Looking forward to digging into this!!

On Tue, Nov 3, 2020 at 9:02 PM Adeniyi Adeboye notifications@github.com wrote:

Results from Null Distribution

-

Null distribution - is generated by getting the median correlation score of randomly combined compounds that do not share/come from the same MOAs.

In our case, we generated 1000 median correlation scores from randomly combined compounds as the null distribution for each MOA.

A P value can be computed nonparametrically by evaluating the probability of random compounds of different MOAs having greater median similarity value than compounds of the same MOAs.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/broadinstitute/lincs-profiling-comparison/issues/2#issuecomment-721469990, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAJHQPC57X4UICO5MFUKSITSOCY33ANCNFSM4S52XGNQ .

-- -Shantanu

AdeboyeML commented 3 years ago

- In addition to the figures shown in https://github.com/broadinstitute/lincs-profiling-comparison/issues/2#issuecomment-721469990, I included the density distribution of the p-values vs median scores in the same plots:

- - Median Consensus

- - Median DMSO Consensus

- MODZ Consensus

- MODZ DMSO Consensus

shntnu commented 3 years ago

@AdeboyeML quick question:

In this figure (where each data points is an MOA, I assume), if two MOA's have the same number of compounds, and the same x-axis value (median pairwise correlation between compounds), do they also have the same y-axis value?

Or phrased more simply, are you computing the null distribution once for each MOA size (by size I mean number of compounds in the MOA classs)? Or are you doing it once per MOA? Both are fine, but the former is preferred to remove y-axis variance that's not informative.

shntnu commented 3 years ago

@AdeboyeML kudos for making the data so easy to peek into. I was curious to see if one could see a dose-response in some of the MOAs. Looks like we do in a few.

More during profiling check-in!

moa_score_dose_response

Code

```r moa_consistency <- read_csv( "https://raw.githubusercontent.com/broadinstitute/lincs-profiling-comparison/9bc5db8167674e2c8bec5cee3fcc043117acfbf6/1.Data-exploration/moa_sizes_consensus_datasets/median_dmso_moa_median_scores.csv" ) moa_consistency %<>% rename(moa = X1) moa_consistency %<>% pivot_longer(-moa, names_to = "dose", values_to = "score") moa_consistency %<>% mutate(dose = as.integer(str_remove(dose, "dose_"))) moa_consistency %<>% inner_join( moa_consistency %>% group_by(moa) %>% summarize(score_median = median(score)) %>% filter(score_median > 0.30) ) p <- ggplot(moa_consistency, aes(dose, score)) + geom_line() + facet_wrap( ~ round(score_median, 2) ~ moa, ncol = 5, scales = "free_y") ggsave("~/Desktop/moa_score_dose_response.png", width = 10, height = 10) ```

AdeboyeML commented 3 years ago

@shntnu In regards to question in https://github.com/broadinstitute/lincs-profiling-comparison/issues/2#issuecomment-724933716

Or phrased more simply, are you computing the null distribution once for each MOA size (by size I mean number of compounds in the MOA classs)? Or are you doing it once per MOA? Both are fine, but the former is preferred to remove y-axis variance that's not informative.

I computed the null distribution once for each MOA based on its Size (i.e. by size - number of compounds in a MOA class) per Dose, this means I computed 1000 random median scores for each MOA per Dose from which I computed the p-value.
Null Distribution for each MOA means - I selected a number of random compounds from different MOAs for each MOA, in which this number is the size of the MOA, I did this 1000 times for each MOA per dose.

if two MOA's have the same number of compounds, and the same x-axis value (median pairwise correlation between compounds), do they also have the same y-axis value?

No, the p-value (y-axis) is based on randomly selected 1000 lists of compounds (the length of each list is the size of the MOA) from which I computed the median scores for each MOA, which is unique to each MOA.

shntnu commented 3 years ago

@AdeboyeML thanks for clarifying. Everything looks good, but the one change I recommend is to use the same null distribution for all MOAs of the same size.

There is no upside to having different null distributions for each unique MOA (of the same size), while it has the downside of adding uninformative variance to the p-value estimates.

cc @gwaygenomics

AdeboyeML commented 3 years ago

@shntnu @gwaygenomics

Results from the Null distribution, based on using the same null distribution for all MOAs of the same size.

There seems to be no relationship between the median pairwise correlation and the obtained p-value generated from null distribution.

Median Consensus

Distribution of the median pairwise correlation scores

P-values distribution across doses

Increase in MOAs with values below the significant level (0.05) as dose increases

MOAs with p-values <0.05 in all doses

These MOAs dose responses:

The above results and distributions are similar for the Modz Consensus datasets

shntnu commented 3 years ago

There seems to be no relationship between the median pairwise correlation and the obtained p-value generated from null distribution.

That's really strange – sounds like a bug to me

AdeboyeML commented 3 years ago

@shntnu @gwaygenomics

Comparing the distribution of median scores between L1000 and Lincs Cell painting Consensus datasets

- Major points

213 MOAs (Mechanism of actions) present in both Cell painting and L1000 Level-5 data are compared based on the distribution of their median scores.
During alignment of MOAs in L1000 with the MOAS in Cell painting, I realized that MOAs found in the same broad sample in both L1000 & Cell painting data are partly named differently i.e. the naming of same MOAS in both are not consistent.

Results -- MODZ Consensus dataset

Scatter plot btw L1000 vs Lincs Cell Painting median scores per dose

Median scores in cell painting data are more spread out and have more extreme median values than L1000 data.

Distribution of median scores in L1000 and Cell Painting Data per dose

Distribution on a dose-by-dose basis

- I am still trying to figure out the reason behind the relationship between the p-value and median scores (null distribution) in https://github.com/broadinstitute/lincs-profiling-comparison/issues/2#issuecomment-726123235

AdeboyeML commented 3 years ago

@shntnu @gwaygenomics

Results from the Null distribution, based on using the same null distribution for all MOAs of the same size. (L1000 & Cell Painting)

I have been able to figure out what was wrong with my code, which resulted in the strange relationship between the p-value and median scores (null distribution) in https://github.com/broadinstitute/lincs-profiling-comparison/issues/2#issuecomment-726123235
Corrected Plots of the Null distribution are below:

broadinstitute / lincs-profiling-complementarity