AutomatedProcessImprovement / log-distance-measures

Python package with event log distance and similarity metrics
Apache License 2.0
5 stars 1 forks source link

Implement EMD instead of Wasserstein Distance #11

Closed david-chapela closed 1 year ago

david-chapela commented 1 year ago

EMD compares two histograms and computes the minimum amount of movements (penalized by the distance) that have to be done to transform one of the histograms to the other.

Wasserstein Distance computes this between two probability distributions (as the area difference in the integrals of their Cumulative Distribution Functions), but not between two histograms. Hence, when there is a mass difference between the two samples, the results can differ (maybe even more differences).

Find or implement the EMD as the comparison between two histograms, not probability distributions.