circstat / circstat-matlab

Matlab Circular Statistics Toolbox
79 stars 41 forks source link

circ_median bug produces incorrect results #29

Open da5nsy opened 2 years ago

da5nsy commented 2 years ago

I think @hmselwyn and myself have found a bug while working with the code. For two almost identical vectors, one returns a median value 90 degrees offset from the other (with one being visually much more believable).

500 501

Here is code to reproduce:

vec = [0    0.0933783592688661  0.0218203059475119  0   0.130866888847798   0   0   0.0928679842597452  0   0   0   0.184183675599567   0   0   0   0   0.0725115681425198  0   0.213419184978058   0   0   0.303981509156697   0.322274328443051   0.335135637838978   0.783702848417281   0.595042979147076   0.806262484491184   0.871468273818819   0.842091680208975   1   0.819227266475634   1   0.860628145958628   0.898968521555846   0.808241220534033   1   0.689390470479743   0.922373901149063   0.599312624837704   0.782832201879935   0.242554220292710   0.663264227214449   1   0.823051841826076   0.240867602784924   0.308050331026260   0.325710990181524   0.289983070196255   0   0   0   0.180144989488012   0   0   0   0.132179899406128   0   0   0.143007453646333   0.0711507311403174  0   0   0   0];

%%

ap = 501; %!!!!!!!!!!!!!!!!!

bigN = 64;
interval = 360/bigN;

points = -180+interval:interval:180;

bin_starts = points - interval/2;
bin_ends = points + interval/2;

t = [];
for j = 1:bigN
    t = [t,deg2rad(linspace(bin_starts(j),bin_ends(j),round(vec(j)*ap)))];
end

%%
bias_rad = circ_median(t,2);
bias_deg = rad2deg(bias_rad);

%%

figure,
circ_plot(t,'hist',[],[],false,false);
title(['ap = ',num2str(ap),newline,'median = ',num2str(bias_deg),char(176)])
huangziwei commented 2 years ago

@da5nsy thanks for raising the issue.

First, a simple fix: If you remove one data point from t when ap=500, it will returns the correct result.

It's indeed a bug. The current implementation doesn't cover all the edge cases for even number of samples.

When alpha has an even number of samples, circstat returns the mean of two potential medians. For well-behaved data (mostly concentrated data), these two potential medians were likely nearby, thus this simplistic approach works well. But for not-so-well-bahaved data (e.g. too spread out, but there are many others...), such as yours, it returns the median and the anti-median, thus the mean of them would be 90 degrees off.