KrishnaswamyLab / scprep

A collection of scripts and tools for loading, processing, and handling single cell data.
MIT License
72 stars 19 forks source link

Document sign change of EMD in `scprep.stats.differential_expression` #130

Closed CRISTIANJULIOCESAR closed 2 years ago

CRISTIANJULIOCESAR commented 2 years ago

@scottgigante-immunai

Hi i have this great heatmap using your function:

x=scprep.stats.differential_expression_by_cluster(EXP, cluster, measure='emd', direction='up', gene_names=None, n_jobs=-2) However, using EDM metric is a metric distances, I understand that it is a comparison of cluster by all clusters, but do not understand why there are positive and negative values on the heatmap because Earth distance mover of scypi is just a distances metric.

download2

For example i got something by cluster of ranked genes: {'0': emd rank MT-RNR2 5.668488 0 MT-ND4 5.305992 1 MT-CO2 4.492841 2 MT-CO1 4.006456 3 MT-CYB 3.903149 4 ... ... ... RP11-993B23.3 0.000000 23917 OTUD7A 0.000000 23918 DPY19L2 0.000000 23919 C10ORF67 0.000000 23920 RP13-228J13.6 0.000000 23921 ........ [23922 rows x 2 columns], '9': emd rank FTH1 2.317543 0 TFF1 1.485828 1 S100A9 1.229603 2 TPT1 1.152300 3 CSTB 1.137404 4 ... ... ... MT-CO1 -1.030647 23917 TUBA1B -1.052380 23918 MT-CO2 -1.165543 23919 MT-ND4 -1.378224 23920 MT-RNR2 -1.655259 23921

[23922 rows x 2 columns]}

Why do I have negative values? , other thing i do not understand the implication of the parameter "direction". Hope you can help me thank you.

For example if i apply just EDM it is not posible to get negative values of EDM

from scipy.stats import wasserstein_distance

c1 = exp_diff[exp_diff['clusters'] == "9"]['MT-ND4'] 
c2 = exp_diff[exp_diff['clusters'] == "0"]['MT-ND4'] 
wasserstein_distance(c1,c2)

""
or
""
from scipy.stats import wasserstein_distance

c1 = exp_diff[exp_diff['clusters'] == "1"]['MT-ND4'] 
c2 = exp_diff[exp_diff['clusters'] == "0"]['MT-ND4'].append(exp_diff[exp_diff['clusters'] == "2"]['MT-ND4']) 
wasserstein_distance(c1,c2)
scottgigante-immunai commented 2 years ago

Hi @CRISTIANJULIOCESAR , the EMD is multiplied by the sign of the difference to denote the overall direction of the shift. If you want the genes ranked by the absolute value, you can use direction='both'.

CRISTIANJULIOCESAR commented 2 years ago

Thanks @scottgigante that solves my questions.

scottgigante commented 2 years ago

I'll leave this open until I document this properly. Thanks for reporting!