Median calculation in Threshold

ChrisVeigl / BrainBay

Open Source Biofeedback Software

http://brainbay.lo-res.org

Other

163 stars 52 forks source link

Median calculation in Threshold #21

Open ElliotMebane opened 5 years ago

ElliotMebane commented 5 years ago

I noticed that the use of median seemed to favor the last value of the range instead of the whole range of values used in the interval. I checked the calculation and the median calculation only uses the last iterator value: for (i = 0, sum = 0; (i <= 1024) && (sum < numtrue); i++) { sum += buckets[i]; } to_input = size_value(0.0f,1024.0f,(float)i,in_ports[0].in_min,in_ports[0].in_max,0);

It looks like the buckets need to be sorted then the middle value in the list should be chosen.

ChrisVeigl commented 5 years ago

uups !

that bug was in there for a log time! actually the median value was added by a contributor and i did not reallly check it's functionality well enough ! i'll have a look when time permits,

ChrisVeigl commented 5 years ago

It looks like the buckets need to be sorted then the middle value in the list should be chosen.

the code for median was contributed before Brainbay was under version control. after having a look I'm not so sure that the calcualtion is wrong:

the buckets are used in order to prevent sorting the incoming values (to save computation effort particularily for larger intervals). they divide the whole singal range into 1024 "bins" of equal size (sacrificing precision). for an incoming value it's associated bin is increased by one. so above for loop IMO makes sense to find the bottom x% or top x% of the values (represented by bottom / top numtrue bin entries, where numtrue is the number of samples in that interval * x/100)

ElliotMebane commented 5 years ago

OK, I thought the traditional use of the term median was being used (middle bucket).

The fan on my VR-ready laptop engages when BrainBay runs, so I suppose all the optimization that can be done is worth it.

-- new values outside the min/max settings get clipped to the lowest/highest bucket in incoming_data method. Not sure what impact that may have. -- there are 1025 entries in the bucket, FYI. The big/small adapt blocks use for loops that seem to be consistent with that length (one counts up and the other counts down), but be careful not to assume the length is 1024.
-- I"m not sure if the for loops in the bigadapt/smalladapt blocks are counting in the correct directions. The bigadapt block counts backwards until the percentage has been met. So a high percentage would trim off the bulk of the top values, returning a number on the low side of the range.