RobTillaart / RunningMedian

Arduino library to determine the running median by means of a circular buffer.
MIT License
46 stars 9 forks source link

Peter Kowald #22

Closed pkowald closed 1 year ago

pkowald commented 1 year ago

I changed this routine because it allows bias in to the calculation as the data set grows.

// this was called getAverage(uint8_t nMedians), but I changed it for clarity 
float RunningMedian::getMedianAverage(uint8_t nMedians) //nMedians is teh spread 
{
  if ((_count == 0) || (nMedians == 0)) return NAN;

//  when filling the array for first time
   if (_count < nMedians)
  {
     nMedians = _count;
     if(_count == 1) // <<<<<<< bail early as just return the ONLY value there is //
     {
        return _values[0]; 
     }
  }

  // this is designed to eliminate the bias when the nMedians could fall to the left or right of the centre //
  // so if the count and nMedians are not either BOTH odd value or Both even value we reduce the spread by 1 to make them the  same both odds or both evens //
  if((_count & 0x01) != (nMedians & 0x01)) //    
  {
     --nMedians;  
  }

  uint8_t start = ((_count - nMedians) / 2);
  uint8_t stop = start + nMedians;

  if (_sorted == false) sort();

  float sum = 0;
  for (uint8_t i = start; i < stop; i++)
  {
    sum += _values[_sortIdx[i]];
  }
  return sum / nMedians;
}

// updated code tags for readability

RobTillaart commented 1 year ago

Thanks for the issue. Will look into it asap and do some tests. There is another special case when count ==2 Then the median is the average of them. No need to sort then either.

RobTillaart commented 1 year ago

Consider the scenario when

Start and stop would be same so sum == 0 Returns 0

Todo: test sketch

RobTillaart commented 1 year ago

test sketch

//
//    FILE: RunningMedian_test.ino
//  AUTHOR: Rob Tillaart
// PURPOSE: test sketch
//     URL: https://github.com/RobTillaart/RunningMedian

#include <RunningMedian.h>

RunningMedian samples = RunningMedian(11);

void setup()
{
  Serial.begin(115200);
  Serial.print("Running Median Version: ");
  Serial.println(RUNNING_MEDIAN_VERSION);

  test1();
}

void loop()
{
}

void test1()
{
  for (int x = 0; x < 7; x++)
  {
    for (int y = 0; y <= x; y++)
    {
      Serial.print(x);
      Serial.print('\t');
      Serial.print(y);
      Serial.print('\t');
      Serial.println(samples.getMedianAverage(y));
    }
    Serial.println();
    samples.add(x);
  }
}

// -- END OF FILE --

output

Running Median Version: 0.3.7
0   0   nan

1   0   nan
1   1   0.00

2   0   nan
2   1   nan
2   2   0.50

3   0   nan
3   1   1.00
3   2   1.00
3   3   1.00

4   0   nan
4   1   nan
4   2   1.50
4   3   1.50
4   4   1.50

5   0   nan
5   1   2.00
5   2   2.00
5   3   2.00
5   4   2.00
5   5   2.00

6   0   nan
6   1   nan
6   2   2.50
6   3   2.50
6   4   2.50
6   5   2.50
6   6   2.50

when nMedian == 0 ==> NAN is correct when nMedian == 1 and count is even there is an incorrect NAN.. other values seem OK.

RobTillaart commented 1 year ago

Please try this version, and let me know your observations

float getMedianAverage(uint8_t nMedians) //nMedians is teh spread 
{
  //  handle special cases quickly.
  if ((_count == 0) || (nMedians == 0)) return NAN;
  if (_count == 1) return _values[0]; 
  if (_count == 2) return (_values[0] + _values[1]) * 0.5; 

  //  count is at least 3 from here
  //  when filling the array for first time
  if (_count < nMedians) nMedians = _count;

  // eliminate the bias when the nMedians could fall to the left or right of the centre
  // if count and nMedians are not both odd or both even reduce the spread by 1 
  // to make them the same. If nMedians becomes 0 correct this. to 2.
  if ((_count & 0x01) != (nMedians & 0x01))
  {
     --nMedians;
     if (nMedians == 0) nMedians = 2;     
  }

  uint8_t start = (_count - nMedians) / 2;
  uint8_t stop = start + nMedians;

  if (_sorted == false) sort();

  float sum = 0;
  for (uint8_t i = start; i < stop; i++)
  {
    sum += _values[_sortIdx[i]];
  }
  return sum / nMedians;
}
RobTillaart commented 1 year ago

BTW I think it is a good enhancement of the library.

RobTillaart commented 1 year ago

minor optimization

if (_count < nMedians) return getAverage();
RobTillaart commented 1 year ago

@pkowald Created develop branch with your fix => see pull request https://github.com/RobTillaart/RunningMedian/pull/23

RobTillaart commented 1 year ago

Created a simple sketch that shows that getAverage(N) and getMedianAverage(N) differ. (derived from sketch above, so it tests getMedianAverage(N) too)

pkowald commented 1 year ago

Works great Thanks

pkowald commented 1 year ago

Thanks Rob Sorry for not interacting, I have been away traveling.

Cheers

Peter

From: Rob Tillaart @.> Sent: Thursday, 13 July 2023 9:42 PM To: RobTillaart/RunningMedian @.> Cc: pkowald @.>; Mention @.> Subject: Re: [RobTillaart/RunningMedian] Peter Kowald (Issue #22)

Closed #22 https://github.com/RobTillaart/RunningMedian/issues/22 as completed via https://github.com/RobTillaart/RunningMedian/commit/9906a81a866e3ce3c5eb452a2d89a0a63a8b94fe 9906a81.

— Reply to this email directly, view it on GitHub https://github.com/RobTillaart/RunningMedian/issues/22#event-9810389399 , or unsubscribe https://github.com/notifications/unsubscribe-auth/AIAY7LVNZIDWDUCFHPM6KJLXP7Q2ZANCNFSM6AAAAAA2CONKFQ . You are receiving this because you were mentioned.Message ID: @.***>

RobTillaart commented 1 year ago

Good to hear it meets your needs

RobTillaart commented 1 year ago

There might be a change in the future in the function.

Now to restore the bias one element less is used. What would be more correct is to have two elements at 50 percent. This is just like the median of an even array, the average of the middle two elements.

The advantages are (imho)

Disadvantages

In code this could look like

float getMedianAverage(uint8_t nMedians) //nMedians is teh spread 
{
  //  handle special case quickly.
  if ((_count == 0) || (nMedians == 0)) return NAN;
  //  test most often occurring case first
  if (_count > 2)
  {
    if (_sorted == false) sort();

    if (_count < nMedians) nMedians = _count;

    uint8_t start = (_count - nMedians) / 2;
    uint8_t stop = start + nMedians;

    float sum = 0;
    //  eliminate the bias by taking two values for 50%
    if ((_count & 0x01) != (nMedians & 0x01))
    {
      //  add two edge elements for 50%
      sum = _values[_sortIdx[start]] + _values[_sortIdx[stop]];
      sum *= 0.5;
      //  remove biased element
      start++;
    }

    for (uint8_t i = start; i < stop; i++)
    {
      sum += _values[_sortIdx[i]];
    }
    return sum / nMedians;
  }
  //  less often occuring special cases
  if (_count == 2) return (_values[0] + _values[1]) * 0.5; 
  //  _count == 1;
  return _values[0];
}