djnavarro / rbook

Source files for "Learning Statistics with R"
https://learningstatisticswithr.com
470 stars 156 forks source link

section on quantiles needs some nuance #34

Open djnavarro opened 3 years ago

djnavarro commented 3 years ago

(from my correspondence with Sabine Schulte im Walde)

There are multiple ways of defining sample quantiles, and SAS, SPSS and R all have different defaults. The quantile() function has a type argument that lets you choose among 9 different definitions. Perhaps unfortunately, the afl.margins data is one for which the 0.25 quantile gives different answers in all three platforms:

> quantile(afl.margins, type = 1)
  0%  25%  50%  75% 100%
   0   12   30   50  116

> quantile(afl.margins, type = 2) # SAS default
   0%   25%   50%   75%  100%
  0.0  12.5  30.5  51.0 116.0

> quantile(afl.margins, type = 3)
  0%  25%  50%  75% 100%
   0   12   30   50  116

> quantile(afl.margins, type = 4)
  0%  25%  50%  75% 100%
   0   12   30   50  116

> quantile(afl.margins, type = 5)
   0%   25%   50%   75%  100%
  0.0  12.5  30.5  51.0 116.0

> quantile(afl.margins, type = 6) # SPSS default
    0%    25%    50%    75%   100%
  0.00  12.25  30.50  51.50 116.00

> quantile(afl.margins, type = 7) # R default
    0%    25%    50%    75%   100%
  0.00  12.75  30.50  50.50 116.00

> quantile(afl.margins, type = 8) 
       0%       25%       50%       75%      100%
  0.00000  12.41667  30.50000  51.16667 116.00000

> quantile(afl.margins, type = 9) 
      0%      25%      50%      75%     100%
  0.0000  12.4375  30.5000  51.1250 116.0000 

relevant paper:

Hyndman, R. J. and Fan, Y. (1996) Sample quantiles in statistical packages, American Statistician 50, 361–365. doi: 10.2307/2684934.