Add MAD measure - Githubissues

holtgrewe commented 6 years ago

The MAD is a robust alternative to standard deviation, it would be nice to have besides stddev.

https://en.wikipedia.org/wiki/Median_absolute_deviation

BurntSushi commented 6 years ago

Sorry, but this feature request is incomplete. Please:

Assume that folks reading your comment have never heard of MAD before.
Explain how you would use MAD in a real example.
Explain how the presence of MAD in addition to standard deviation would lend extra insight in a real example.

xsv cannot be in the business of adding every statistical measure, so each one needs to be vetted individually. The stats computed today are ubiquitous. MAD is not.

holtgrewe commented 6 years ago

The median absolute deviation (cmp. Wikipedia) is a robust alternative to the standard deviation for measuring the variability of a sample. In spirit, it is comparable to the median.

Where the arithmetic mean is based on the sum of sample values, divided by sample count, the median is based on the value with the "center rank". By this, the median is more robust to outlier (the typical example here is the mean net worth of a room of 100 people when one is Bill Gates).

Similarly, the standard deviation is based on the differences between the sample values and the mean (again, outliers such as Bill Gates' net worth will greatly skew the value). In comparison, the median absolute difference is computed by taking the list of absolute differences between the median and the sample values, sorting them and then taking the center rank value.

E.g., in quantitative biology one example would be robustness against outliers in microarray analysis, e.g., stemming from artifacts. One might want to get a measure for the variance of intensity measures. You can think of this as considering a grayscale picture, each pixel having intensity between 0.0 and 1.0. Some pixels might just be set close to 1.0 and are technical artifacts while the overall level might be at 0.1. Here, the MAD would describe the variance of the "majority" the pixels, similar to the median robustly describing "an average pixel".

Of course, one alternative would be trimming the data by cutting away the top and bottom 10% of the data, but that argument could also be given against median.

In terms of being ubiquitous, I would offer

mad being part of base R, Excel implements it...
distribution medium-absolute-deviation yielding 538k hits on Google, in the same ballpark of the implemented distribution mode of 777k hits

What do you think?

BurntSushi / xsv

Add MAD measure #116