IoSR-Surrey / MatlabToolbox

General purpose Matlab toolbox
MIT License
128 stars 43 forks source link

Wrong median in boxPlot #13

Open simonkahl opened 4 years ago

simonkahl commented 4 years ago

Hello, this is a great toolbox and is liked the styling and the customization of the boxplots. However, I stumbled upon a bug in the calculation of the median. This script and the corresponding figure should clarify this issue:

data = [1 1 1 2 4 6 7];

figure subplot( 1, 2, 1 ) b = iosr.statistics.boxPlot( data' ); title( {'IoSR-Surrey'; ... 'Matlab Toolbox'; ... ['Median = ' num2str(b.statistics.median)]} )

subplot( 1, 2, 2 ) boxplot(data'); title( {'MatLab R2018b'; ... 'Statistics and Machine Learning Toolbox'; ... ['Median = ' num2str(median(data))]} )

iosrBug

Hopefully this bug can be easily fixed.

chummersone commented 4 years ago

The toolbox and MATLAB use different methods to calculate the median. See the ‘method’ property of iosr.statistics.boxPlot. The same methods are provided by the underlying function: iosr.statistics.quantile.

DominikSchmidbauer commented 2 years ago

I've got the same problem!

This is my data:

[1.501618122977346;0.498381877022654;0.460992907801418;0.375886524822695;1.080378250591016;1.724586288416076;1;0.258227848101266;1.741772151898734]

Median is 1 (9 numbers, 5th number in the sorted list) but the plot shows the median at 0.749190938511327.

The method (neither R-5 nor R-8) does not change anything as it only determines how the quantiles are calculated.

Apparently, the median calculated by boxPlot is the average of the 4th and 5th element in the list.

It happens only with these particular numbers. Other vectors, even with the same length are correctly plotted.

prash-p commented 1 year ago

The issue is that the median is being estimated as the data is being treated as a sample. Most users just want the median calculated as the MATLAB median() value.

This line: https://github.com/IoSR-Surrey/MatlabToolbox/blob/master/%2Biosr/%2Bstatistics/statsPlot.m#L170 should be: obj.statistics.median = median(obj.y)

Changing to R-5 still does not give the correct median value - see in this box plot of 3 points for example the horizontal line does not pass through the middle point in 'R-5' or 'R-8' image

But changing the line as above calculates the 'correct' median (what most users expect):

image