Improve `eRCaGuy_Timer2_Counter::get_count()`

ElectricRCAircraftGuy / eRCaGuy_TimerCounter

An Arduino micros()-like function (encapsulated in a library) with 0.5us precision (since the built-in Arduino micros() function has only 4us precision)

http://www.electricrcaircraftguy.com/2014/02/Timer2Counter-more-precise-Arduino-micros-function.html

31 stars 13 forks source link

Improve `eRCaGuy_Timer2_Counter::get_count()` #12

Open ElectricRCAircraftGuy opened 9 months ago

ElectricRCAircraftGuy commented 9 months ago

Current code:

//get total count for Timer2
unsigned long eRCaGuy_Timer2_Counter::get_count()
{
  uint8_t SREG_old = SREG; //back up the AVR Status Register; see example in datasheet on pg. 14, as well as Nick Gammon's "Interrupts" article - http://www.gammon.com.au/forum/?id=11488
  noInterrupts(); //prepare for critical section of code
  uint8_t tcnt2_save = TCNT2; //grab the counter value from Timer2
  boolean flag_save = bitRead(TIFR2,0); //grab the timer2 overflow flag value; see datasheet pg. 160
  if (flag_save) { //if the overflow flag is set
    tcnt2_save = TCNT2; //update variable just saved since the overflow flag could have just tripped between previously saving the TCNT2 value and reading bit 0 of TIFR2.  
                        //If this is the case, TCNT2 might have just changed from 255 to 0, and so we need to grab the new value of TCNT2 to prevent an error of up 
                        //to 127.5us in any time obtained using the T2 counter (ex: T2_micros). (Note: 255 counts / 2 counts/us = 127.5us)
                        //Note: this line of code DID in fact fix the error just described, in which I periodically saw an error of ~127.5us in some values read in
                        //by some PWM read code I wrote.
    _overflow_count++; //force the overflow count to increment
    TIFR2 |= 0b00000001; //reset Timer2 overflow flag since we just manually incremented above; see datasheet pg. 160; this prevents execution of Timer2's overflow ISR
  }
  _total_count = _overflow_count*256 + tcnt2_save; //get total Timer2 count
  //interrupts(); //allow interrupts again; [updated 20140709] <--WARNING, DO ****NOT**** USE THIS METHOD AFTERALL, OR ELSE IT WILL RE-ENABLE GLOBAL INTERRUPTS IF YOU CALL THIS FUNCTION
                  //DURING AN INTERRUPT SERVICE ROUTINE, THEREBY CAUSING NESTED INTERRUPTS, WHICH CAN REALLY SCREW THINGS UP.
  SREG = SREG_old; //use this method instead, to re-enable interrupts if they were enabled before, or to leave them disabled if they were disabled before
  return _total_count;
}

Potential new code (should result in less jitter due to a shorter period of time where interrupts are off):

unsigned long eRCaGuy_Timer2_Counter::get_count()
{
  // back up the AVR Status Register; see example in datasheet on pg. 14, as well as Nick Gammon's 
  // "Interrupts" article - http://www.gammon.com.au/forum/?id=11488
  uint8_t SREG_old = SREG;
  noInterrupts();
  uint8_t tcnt2_save = TCNT2; // grab the counter value from Timer2
  unsigned long overflow_count = _overflow_count;
  SREG = SREG_old; // restore interrupt state

  _total_count = overflow_count*256 + tcnt2_save; // get total Timer2 count
  // OR (same thing)
  // _total_count = (overflow_count << 8) | tcnt2_save;

  return _total_count;
}

I need to dig back into the git history though and see if that was a prior, broken solution I already had before.

Update: I don't have the git history to know. I'll have to dig up the copy/pasted dirs in my old computer's files instead. The first commit I have with the code is d6987d8, and it already had the bulk of the function content above in it.

The new code will produce a slightly "older" timestamp, but who cares. It should have less jitter, shorter time with interrupts off, and probably even less latency due to being faster.

ElectricRCAircraftGuy commented 9 months ago

Also look into how 64-bit timestamps are obtained on ChibiOS (Ardupilot).

ElectricRCAircraftGuy commented 9 months ago

UPDATE: I remember now why I did this in 2015 or so!

I did the more-complicated solution above so that the get_count() function would work properly, with counting interrupts and all, inside ISRs!

I don't want to break that. So, what I should probably do instead is make two functions:

get_count() - normal, fast (use my new approach above)
get_count_from_isr() - ISR-safe; uses my original more-complicated solution above to maintain counting overflows even when called inside an ISR and interrupts are disabled!

ElectricRCAircraftGuy commented 9 months ago

Ardupilot/ChibiOS uses the _from_ISR() type functions too. See: https://github.com/ArduPilot/ardupilot/blob/master/libraries/AP_HAL_ChibiOS/hwdef/common/hrt.h#L14:

void hrt_init(void);
uint64_t hrt_micros64(void);
uint64_t hrt_micros64_from_ISR(void); // from an ISR
uint32_t hrt_micros32(void);
uint32_t hrt_millis32(void);
uint32_t hrt_millis32I(void); // from locked context
uint32_t hrt_millis32_from_ISR(void); // from an ISR
uint64_t hrt_millis64(void);

Function definitions are here: https://github.com/ArduPilot/ardupilot/blob/master/libraries/AP_HAL_ChibiOS/hwdef/common/hrt.c

Here's the commit from Andrew Tridgell that speeds up these timestamp calls: https://github.com/ArduPilot/ardupilot/commit/6dbc3b6a70ad15a5d0d80411f408d1fb0e7d6ea9

Here's the pull request: https://github.com/ArduPilot/ardupilot/pull/25790

edgar-bonet commented 9 months ago

Your “potential new code” suffers from a race condition, highlighted below:

unsigned long eRCaGuy_Timer2_Counter::get_count()
{
    uint8_t SREG_old = SREG;
    noInterrupts();
    // <-- race condition here! -->
    uint8_t tcnt2_save = TCNT2;
    unsigned long overflow_count = _overflow_count;
    SREG = SREG_old;
    _total_count = overflow_count*256 + tcnt2_save;
    return _total_count;
}

Consider what happens if the timer overflows right after executing noInterrupts(). The timer overflow ISR will not run (it will be pending until the end of this critical section), then:

tcnt2_save will store the post-overflow value of TCNT2 (zero or very close to zero)
overflow_count will still have the pre-overflow count of overflows
_total_count will be off by 256.

The vulnerability window would be tiny, maybe even a single CPU cycle. This means the bug would manifest itself very rarely, and would be practically impossible to find in testing.

ElectricRCAircraftGuy commented 9 months ago

@edgar-bonet, you are right! It's been 10 years since I first wrote this code, and I think that was part of my original logic in how I arrived at the bug-free and ISR-safe version in-place now and as shown in the top of the first post above.

I am so glad you are watching this! How did you find this post anyway? How long have you been a "watcher" of this repo?

What do you think about this fix for the non-ISR-safe version of get_count()? I think it would work.

unsigned long eRCaGuy_Timer2_Counter::get_count()
{
    uint8_t tcnt2_save1 = TCNT2; // grab the counter value from Timer2

    uint8_t SREG_old = SREG;
    noInterrupts();
    unsigned long overflow_count = _overflow_count; // critical section: 4-byte read on 8-bit processor
    // NB: keep this next line inside the interrupt-guarded critical section, not because it needs it 
    // by itself, but because if someone writes an awful ISR that takes so long *another* rollover could 
    // occur right after restoring the interrupt state below, I want to not have to worry about counting
    // that rollover manually too. 2^8 = 256 counts * 0.5us/count = 128us per overflow.
    uint8_t tcnt2_save2 = TCNT2;
    SREG = SREG_old; // restore interrupt state

    if (tcnt2_save2 < tcnt2_save1) 
    {
        // An overflow just occurred, so manually count it for this calculation.
        // The overflow ISR will run and handle it for future calculations.
        overflow_count++;
    }

    // get total Timer2 count
    _total_count = (overflow_count << 8) | tcnt2_save2;

    return _total_count;
}

ElectricRCAircraftGuy commented 9 months ago

Note to self:

For a 32-bit mcu, such as PIC32, counting core timer overflows, the following code might work.

32-bit reads and writes are already atomic, and a 32-bit 100 MHZ core timer overflow occurs only every 2^32 ticks / 100000000 ticks/sec = 42.9 seconds, which is long enough that no interrupt would possibly take that long (unlike my 8-bit timer above where rollover is every 128us).

Potential example code:

// For 32-bit mcus such as PIC32
uint64_t get_counts()
{
    uint32_t count1 = get_core_timer();

    // No critical section needed because we are reading a 32-bit value, and
    // turning off interrupts doesn't stop the core timer from counting
    // anyway.
    uint32_t rollover_count = get_rollover_count();

    uint32_t count2 = get_core_timer();

    if (count2 < count1)
    {
        // rollover just occurred between reading count1 and count2, so manually
        // account for it
        rollover_count++;
    }

    uint64_t total_count = ((uint64_t)rollover_count << 32) | count2;

    return total_count;
}

ElectricRCAircraftGuy commented 9 months ago

My mind is frazzled trying to process this. Here's a question I asked: Stack Overflow: How to properly count timer overflows to convert a 32-bit timer into a 64-bit timer

ElectricRCAircraftGuy commented 9 months ago

For low-tick-rate ISRs called every tick

Note: If there is a tick ISR called at a low tick rate, such as 1ms, this would work too. From my friend JaMc:

uint32_t timer_hi:
uint32_t timer_lo;

// ISR called at every 1ms tick
void timer_isr(void) 
{
    if (++timer_lo == 0) 
    {
        timer_hi++;
    }
}

uint64_t timer_counts(void)
{
    uint64_t counts;

    __disable_irqs();
    counts = timer_lo | ((uint64_t)timer_hi << 32);
    __enable_irqs();

    return count;
}

ElectricRCAircraftGuy commented 9 months ago

On AVR, remove interrupt guards and use my doAtomicRead() function instead. See: https://stackoverflow.com/a/71625693/4561887

Speed test all this stuff and come up with some automated bench tests for it to look for corrupted timestamps.

I should be able to catch problems within seconds to a few minutes, and a successful overnight run should be conclusive.

edgar-bonet commented 9 months ago

How long have you been a "watcher" of this repo?

Probably since I proposed some pull requests.

What do you think about this fix for the non-ISR-safe version of get_count()?

This one still has a race condition:

unsigned long eRCaGuy_Timer2_Counter::get_count()
{
    uint8_t tcnt2_save1 = TCNT2;
    /*--- race here... ---*/
    uint8_t SREG_old = SREG;
    /*--- ...and here ---*/
    noInterrupts();
    unsigned long overflow_count = _overflow_count;
    uint8_t tcnt2_save2 = TCNT2;
    SREG = SREG_old;
    if (tcnt2_save2 < tcnt2_save1)
        overflow_count++;
    _total_count = (overflow_count << 8) | tcnt2_save2;
    return _total_count;
}

If the timer overflows either right before or right after you read SREG, then tcnt2_save1 will be big (255 or close to that), and _overflow_count will be incremented by the timer overflow ISR. Then you will add an extra increment to overflow_count.

Yes, this is more tricky that it looks at first sight... :-/

I suggest taking inspiration on the implementation of micros, which seems pretty solid to me. Based on it, I would suggest this:

unsigned long eRCaGuy_Timer2_Counter::get_count()
{
    unsigned long overflow_count;
    uint8_t tcnt2_save, tifr2_save;
    ATOMIC_BLOCK(ATOMIC_RESTORESTATE) {
        overflow_count = _overflow_count;
        tcnt2_save = TCNT2;
        tifr2_save = TIFR2;
    }
    if ((tifr2_save & _BV(TOV2)) && (tcnt2_save < 128))
        overflow_count++;
    return (overflow_count << 8) | tcnt2_save;
}

Note that I have adapted that a little bit:

I like the ATOMIC_BLOCK() macro from <util/atomic.h>. It does the same SREG and interrupt stuff you are already doing, but it makes the programmer's intent clearer.
I removed non-time critical things from the critical section, and only kept the memory and I/O reads.
I replaced the condition t < 255 by a less stringent test, which will likely be optimized into testing a single bit.

come up with some automated bench tests for it to look for corrupted timestamps.

That would be far from trivial. Since a race condition can have a vulnerability window as small as a single CPU cycle, it can easily be missed by any systematic test. If the test calls get_count() periodically, you must ensure that the period of that call and the timer overflow period are relatively prime. This means that the test period should be an odd number of CPU cycles. Beware that other interrupts may throw these timings off.

ElectricRCAircraftGuy commented 9 months ago

Note to self about a possible optimization to reduce interrupts:

Check for rollovers during the reading of the count. Remove the overflow interrupt. Push back an interrupt to perform a count read as far back as possible. Each time a read is performed, push back the interrupt again. This way, if reads are performed frequently enough, no overflow-type interrupt ever needs to run, and if they are not, fewer interrupts occur because they only occur as often as is necessary to not miss overflow counts.

The read function will need to be protected by atomic access guards now because the same overflow count and count_old variables will be written to both inside and outside of an ISR. So, we need to ensure they are atomically updated both inside and outside ISRs to avoid corruption.

Removing the 5us or so overflow ISR which runs every 128us currently would be a win.

ElectricRCAircraftGuy commented 9 months ago

[ ] Investigation to perform:

Write an empty sketch. Import my library and configure the timer/counter to count at 0.5us/tick.

Write a basic function that manually tracks the overflow and reads the 8-bit counter as fast as it can, printing the result to the serial terminal. Look for patterns of missing timestamps, or gaps in timestamps.
Modify the sketch to collect samples into an array, and only print the array when it is full, to get tighter timestamps closer together.
Add an ISR that calls the timestamp at a fixed interval. Look for race conditions between the main loop code above, and the ISR. There will be no overflow ISR. The overflows will be detected by the count read function.
See if there's some way to overcome the race condition using a "double read" type strategy, rather than disabling interrupts.

See my doAtomicRead() function here: https://stackoverflow.com/a/71625693/4561887.

Note: for a single producer single consumer, it would work. But, for a multi-producer multi-consumer, which is our case now that both the ISR and main loop context are updating and reading the same variables, I think this will not work, but I want to play with it and prove that to myself anyway.

edgar-bonet commented 9 months ago

Addendum: I suggested

// ...
tcnt2_save = TCNT2;
tifr2_save = TIFR2;
// ...

Note that the order of these statements is important. If they get switched, and the timer overflows in-between them... you get the picture.