Open holtgrewe opened 6 years ago
Sorry, but this feature request is incomplete. Please:
xsv cannot be in the business of adding every statistical measure, so each one needs to be vetted individually. The stats computed today are ubiquitous. MAD is not.
The median absolute deviation (cmp. Wikipedia) is a robust alternative to the standard deviation for measuring the variability of a sample. In spirit, it is comparable to the median.
Where the arithmetic mean is based on the sum of sample values, divided by sample count, the median is based on the value with the "center rank". By this, the median is more robust to outlier (the typical example here is the mean net worth of a room of 100 people when one is Bill Gates).
Similarly, the standard deviation is based on the differences between the sample values and the mean (again, outliers such as Bill Gates' net worth will greatly skew the value). In comparison, the median absolute difference is computed by taking the list of absolute differences between the median and the sample values, sorting them and then taking the center rank value.
E.g., in quantitative biology one example would be robustness against outliers in microarray analysis, e.g., stemming from artifacts. One might want to get a measure for the variance of intensity measures. You can think of this as considering a grayscale picture, each pixel having intensity between 0.0
and 1.0
. Some pixels might just be set close to 1.0
and are technical artifacts while the overall level might be at 0.1
. Here, the MAD would describe the variance of the "majority" the pixels, similar to the median robustly describing "an average pixel".
Of course, one alternative would be trimming the data by cutting away the top and bottom 10% of the data, but that argument could also be given against median.
In terms of being ubiquitous, I would offer
mad
being part of base R, Excel implements it...distribution medium-absolute-deviation
yielding 538k hits on Google, in the same ballpark of the implemented distribution mode
of 777k hitsWhat do you think?
The MAD is a robust alternative to standard deviation, it would be nice to have besides
stddev
.