broadinstitute / grit-benchmark

Benchmarking a metric used to evaluate a perturbation strength
BSD 3-Clause "New" or "Revised" License
5 stars 5 forks source link

MP-Value vs. Grit #8

Open gwaybio opened 3 years ago

gwaybio commented 3 years ago

In #6 I compare Grit to mp-value. Here are the results:

cell_health_grit_mpvalue_comparison

The patterns are quite interesting! It is especially interesting that many perturbations have high mp-value, but very low grit (cc @shntnu) I am not sure how to rationalize this. I was thinking that these high mp, low grit samples could be ones with relatively low median replicate correlation. Having low median replicate correlation might not be reflected well in Mahalanhobis distance calculations.

I visualized this relationship (using 5,000 permutations), and it looks like it is indeed playing a role, but it probably doesn't paint the full picture.

cell_health_mpvalue_replicatereproducibility_comparison

Perhaps another for this has to do with 5,000 permutations being too few.

@koalive, as an mp-value expert, do you have any thoughts on why this might be happening?

koalive commented 3 years ago

Nice to see this comparison, congrats! I'm not too worried about the number of permutations of the mp-value, as it seems the results are already fairly stable compared to the 1000 permutations. Adding more permutations would add some definition in the range of high log-mp values, but the samples with high log-mp values and low grit would stay in this region of the plot anyway so the question remains open. For sure, if a sample has a low replicate correlation, this wouldn't be well integrated in the mp-value which takes the mean of the Mahalanobis distance to controls (i.e. the dispersion of the controls matters but the dispersion of the replicates not so much). Another difference I can think of would be coming from the use of a distance for the mp-values against a correlation for grit values. The metrics might work differently for different data normalization. If we take a CRISPR screen as example, some replicates could have lower or higher efficiency, meaning that the direction of the changes would be the same but the strength different. In this case, if you center and scale your data to get them as changes compared to the mean of your negative control, the replicates would correlate well despite being dispersed, which would lead to lower grit values whereas the mp-values would not consider this alignment.

Edit: I think the last point is a bit confused. A (hopefully) clearer example: if you have three profiles A, B and C such that A = 2B = 4C, then PCC(A,B) = PCC(A,C) = 1, yet the distance(A,B) < distance(A,C). In this case the grit and mp-values would be significantly different.

gwaybio commented 3 years ago

In this case, if you center and scale your data to get them as changes compared to the mean of your negative control, the replicates would correlate well despite being dispersed, which would lead to lower grit values whereas the mp-values would not consider this alignment.

Interesting... indeed, in #11, we observed that negative-control based grit was lower than whole-plate grit: see https://raw.githubusercontent.com/broadinstitute/grit-benchmark/main/2.compare-metrics/cell-health/figures/plate_normalization/cell_health_grit_platenormalization_comparison.png

Your hypothesis is that mp-value will be unchanged with different normalization methods?

if you have three profiles A, B and C such that A = 2B = 4C, then PCC(A,B) = PCC(A,C) = 1, yet the distance(A,B) < distance(A,C). In this case the grit and mp-values would be significantly different.

Good point! This supports our original thought that "Having low median replicate correlation might not be reflected well in Mahalanhobis distance calculations." right? In other words, the high MP-value/low Grit perturbations could mean substantially far-away profiles, but with high correlation to controls? Not really sure what that would mean biologically...

koalive commented 3 years ago

Your hypothesis is that mp-value will be unchanged with different normalization methods?

It would be unchanged for any centering / scaling. Other transformations (e.g. log-transform) might change things a lot.

In other words, the high MP-value/low Grit perturbations could mean substantially far-away profiles, but with high correlation to controls?

I think so. Either a high correlation to control or a low correlation to replicates. "Close profiles" would not correlate well between replicates and lead to low Grit and log-mp-values while "far-away profiles" would lead to high log-mp-values but not necessarily to high Grit scores, and it is actually matching what you observe!

Biologically, I feel that mp-values would be appropriate when dose-dependent effects are of interest (for instance drug concentration or editing efficiency) while grit would be better suited when you want to pool these effects together ("is a drug inducing changes?" rather than "is this drug concentration enough to induce changes?").

gwaybio commented 3 years ago

"far-away profiles" would lead to high log-mp-values but not necessarily to high Grit scores, and it is actually matching what you observe!

I feel that mp-values would be appropriate when dose-dependent effects are of interest (for instance drug concentration or editing efficiency) while grit would be better suited when you want to pool these effects together ("is a drug inducing changes?"

Interesting! Do you think that calculating grit with respect to distance instead of Pearson correlation would be better? One reason Pearson correlation is preferred, is because the normalization (as long as it's consistent) doesn't matter as much, and comparing grit scores across datasets is easier.

For the dose-dependent/edit efficiency effects, we do observe that grit handles this well: https://raw.githubusercontent.com/broadinstitute/grit-benchmark/main/2.compare-metrics/perturb-seq/figures/GSE132080_crispri_grit_relative_activity_comparison.png

☝️ in a CRISPRi dataset, grit tracks nicely with a measure of gene expression knockdown (relative efficiency)

koalive commented 3 years ago

I feel there's no perfect general solution, it all depends on the specific goal of the experiment. Offering the option of a correlation-based metric is probably better to diversify the options one might have compared to what already exists with mp-values and statistical distances and...

comparing grit scores across datasets is easier

In my experience, this part is particularly tricky with statistical distances. If you want to compare two perturbations, either you compare the Mahalanobis distances which only consider the dispersion of the control, or you compare mp-values (which are empirical p-values) which don't tell you much about the effect size. Other distances still don't have the nice interpretation you mention in the README

"on average, compound X is 5 standard deviations more similar to replicates than to DMSO controls"

☝️ in a CRISPRi dataset, grit tracks nicely with a measure of gene expression knockdown (relative efficiency)

That's interesting! Does it hold if you center your data? I can probably look into the code myself anyway...