Open AdeboyeML opened 3 years ago
@gwaygenomics
@AdeboyeML and I inspected one of the cases and found that the dose remapping can be flawed. E.g. PYM50028
(BRD-K62277907-001-01-6
) has two doses coded as level 1
@AdeboyeML uses corr_val = abs(df_dose_corr.loc[cpds[y], cpds[x]])
but we shouldn't take the absolute value. We may have compounds in there that are negatively correlated and that should count against the MOA.
Also consider computing the correlation matriix of the subsetted dataframe corresponding to the replicates you care about and then take the median of the lower (or upper triangular matrix). This is an implementation detail, but the logic is otherwise correct.
- MOAS that do not have the same number of compounds in all Doses
@gwaygenomics @shntnu
Null distribution - is generated by getting the median correlation score of randomly combined compounds that do not share/come from the same MOAs.
In our case, we generated 1000 median correlation scores from randomly combined compounds as the null distribution for each MOA.
A P value was computed nonparametrically by evaluating the probability of random compounds of different MOAs having greater median similarity value than compounds of the same MOAs.
Looking forward to digging into this!!
On Tue, Nov 3, 2020 at 9:02 PM Adeniyi Adeboye notifications@github.com wrote:
Results from Null Distribution
-
Null distribution - is generated by getting the median correlation score of randomly combined compounds that do not share/come from the same MOAs.
In our case, we generated 1000 median correlation scores from randomly combined compounds as the null distribution for each MOA.
A P value can be computed nonparametrically by evaluating the probability of random compounds of different MOAs having greater median similarity value than compounds of the same MOAs.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/broadinstitute/lincs-profiling-comparison/issues/2#issuecomment-721469990, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAJHQPC57X4UICO5MFUKSITSOCY33ANCNFSM4S52XGNQ .
-- -Shantanu
@AdeboyeML quick question:
In this figure (where each data points is an MOA, I assume), if two MOA's have the same number of compounds, and the same x-axis value (median pairwise correlation between compounds), do they also have the same y-axis value?
Or phrased more simply, are you computing the null distribution once for each MOA size (by size I mean number of compounds in the MOA classs)? Or are you doing it once per MOA? Both are fine, but the former is preferred to remove y-axis variance that's not informative.
@AdeboyeML kudos for making the data so easy to peek into. I was curious to see if one could see a dose-response in some of the MOAs. Looks like we do in a few.
More during profiling check-in!
@shntnu In regards to question in https://github.com/broadinstitute/lincs-profiling-comparison/issues/2#issuecomment-724933716
Or phrased more simply, are you computing the null distribution once for each MOA size (by size I mean number of compounds in the MOA classs)? Or are you doing it once per MOA? Both are fine, but the former is preferred to remove y-axis variance that's not informative.
I computed the null distribution once for each MOA based on its Size (i.e. by size - number of compounds in a MOA class) per Dose, this means I computed 1000 random median scores for each MOA per Dose from which I computed the p-value.
Null Distribution for each MOA means - I selected a number of random compounds from different MOAs for each MOA, in which this number is the size of the MOA, I did this 1000 times for each MOA per dose.
if two MOA's have the same number of compounds, and the same x-axis value (median pairwise correlation between compounds), do they also have the same y-axis value?
@AdeboyeML thanks for clarifying. Everything looks good, but the one change I recommend is to use the same null distribution for all MOAs of the same size.
There is no upside to having different null distributions for each unique MOA (of the same size), while it has the downside of adding uninformative variance to the p-value estimates.
cc @gwaygenomics
@shntnu @gwaygenomics
- There seems to be no relationship between the median pairwise correlation and the obtained p-value generated from null distribution.
That's really strange – sounds like a bug to me
@shntnu @gwaygenomics
213 MOAs (Mechanism of actions) present in both Cell painting and L1000 Level-5 data are compared based on the distribution of their median scores.
During alignment of MOAs in L1000 with the MOAS in Cell painting, I realized that MOAs found in the same broad sample in both L1000 & Cell painting data are partly named differently i.e. the naming of same MOAS in both are not consistent.
@shntnu @gwaygenomics
I have been able to figure out what was wrong with my code, which resulted in the strange relationship between the p-value and median scores (null distribution) in https://github.com/broadinstitute/lincs-profiling-comparison/issues/2#issuecomment-726123235
Corrected Plots of the Null distribution are below:
@gwaygenomics @shntnu
MOAs with only one compound were excluded. Out of 601 MOAs, 369 were excluded.
Results: I will be showing only the first 10 MOAs values per dose for all the consensus datasets and the heatmap plots of the 232 MOAs for each of the consensus datasets. Doses 0 (has only dmso) and 7 (has only two MOAs) were excluded from the analysis.
1a. Median Aggregation Consensus dataset - consensus_median (whole plate normalization).
Heatmap plots -- I split the dataframe into 3 parts for easier visualization:
1b. Median Aggregation Consensus dataset - consensus_median_dmso (dmso normalization).
heatmap plots
2a. Modified Z Score Aggregation (MODZ) dataset - consensus_modz (whole plate normalization).
heatmap plots
2b. Modified Z Score Aggregation (MODZ) dataset - consensus_modz_dmso (dmso normalization).
heatmap plots