Undefined behavior on timer rollover

edgar-bonet commented 6 years ago

The method FilterOnePole::input(float) starts with these statements:

long time = micros();
ElapsedUS = float(time - LastUS);   // cast to float here, for math

When micros() reaches 2³¹, which happens roughly 35.8 minutes after the program starts, time becomes a large negative number, and the subtraction time - LastUS overflows.

In C and C++, arithmetic overflow of signed integral types is undefined behavior. This means the program is incorrect, and anything can happen, including (but not limited to) the subtraction giving the expected result. Just for illustration, here is a recent example of the kind of surprises you can get with integer overflows.

The same bug is present in FilterTwoPole.cpp and in FilterDerivative.cpp.

The fix is very simple: all variables holding timestamps (LastUS, LastTimeUS, thisUS, time and now) should be declared unsigned long. This is the type returned by micros(), and it is not a coincidence that it is the right type for timing calculations. Unlike signed integers, the arithmetics on unsigned integers are specified by the C and C++ standards to be done modulo MAX(type)+1. This guarantees that the subtraction will yield the correct result even across a micros() rollover.

JonHub commented 6 years ago

Hi Edgar,

Thanks for the note, and for checking on this - I hope the software is finding some use!

I believe I checked the wrap behaviour on the ARM and AtMEGA architectures, and the code is correct as written on those archetectures. Both have clocks that wrap (standard behavior), and then subtraction cancels out the wrap.

If you have a chance, it would be great if you could try this on your end. You can either just set a time and leave, with some print statements, OR start with a number close to the wrap point, and check through what happens in a loop, something like this

long int x = (max of long int) - 10

loop through, printing out:

print x print "x - (x-1)"

I think you will find that the delta is always one.

You are correct, though, that in general, this is not always true ... it would be super awesome if could check, though, since it's "trick" that I used, and it's not obvious that it is correct.

And, if someone ports to a different archetecture, they need to be made aware of this potential behavior.

To fix, properly, it would make the most sense to make something like a "clock" object, that could wrap, or not wrap. Then, you could use the fix you mention, or something similar, could be transparently be switched in our out.

(In my case, I needed the speed, so checked the failure mode ... still worked!)

I haven't touched that code in a number of years though.

Best, J

On Mon, Feb 12, 2018 at 6:02 AM, Edgar Bonet notifications@github.com wrote:

The method FilterOnePole::input(float) starts with these statements https://github.com/JonHub/Filters/blob/aeb294b4a66cf5dc30d5998cc221115dec95eb80/FilterOnePole.cpp#L20 :

long time = micros(); ElapsedUS = float(time - LastUS); // cast to float here, for math

When micros() reaches 231, which happens roughly 35.8 minutes after the program starts, time becomes a large negative number, and the subtraction time

LastUS overflows.

In C and C++, arithmetic overflow of signed integral types is undefined behavior https://www.nayuki.io/page/undefined-behavior-in-c-and-cplusplus-programs. This means the program is incorrect, and anything can happen, including (but not limited to) the subtraction giving the expected result. Just for illustration, here is a recent example of the kind of surprises you can get with integer overflows https://stackoverflow.com/questions/48731306/program-behaving-strangely-on-online-ides .

The same bug is present in FilterTwoPole.cpp https://github.com/JonHub/Filters/blob/aeb294b4a66cf5dc30d5998cc221115dec95eb80/FilterTwoPole.cpp#L123 and in FilterDerivative.cpp https://github.com/JonHub/Filters/blob/aeb294b4a66cf5dc30d5998cc221115dec95eb80/FilterDerivative.cpp#L5 .

The fix is very simple: all variables holding timestamps (LastUS, LastTimeUS, thisUS, time and now) should be declared unsigned long. This is the type returned by micros(), and it is not a coincidence that it is the right type for timing calculations. Unlike signed integers, the arithmetics on unsigned integers are specified by the C and C++ standards to be done modulo MAX(type)+1. This guarantees that the subtraction will yield the correct result even across a micros() rollover.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/JonHub/Filters/issues/3, or mute the thread https://github.com/notifications/unsubscribe-auth/AFue1lGwWzxThtB4xZFdOIOopB3Iy6YNks5tUESSgaJpZM4SCNp2 .

edgar-bonet commented 6 years ago

the code is correct as written on those archetectures.

No, it is not. And you did not understand my point. Undefined behaviour is not architecture dependent behavior. It doesn't just depend on the architecture: it also depends on the compiler's version, on the optimization level, on the surrounding code, and on the phase of the moon. A piece of code that contains undefined behavior is never correct, you cannot guarantee anything about it's behavior, and no amount of testing can mitigate such a fundamental flaw.

You really, really, really should read the link I provided for undefined behavior. And also the blog post What Every C Programmer Should Know About Undefined Behavior, from the LLVM project, which applies equally to C++.

Let me quote an excerpt from the former:

If you’re lucky, the program triggering UB will show an appropriate error message and/or crash [...]. If you’re unlucky, the program will quietly mangle data [...]. And if you’re very unlucky, the program will do exactly what you hoped it should do [...].

You are very unlucky, and you should read that article to understand why. And please, refrain from sharing code with the community until you understand what undefined behavior really means.

it would be great if you could try this
print "x - (x-1)"
I think you will find that the delta is always one.

Of course it is always one: the compiler optimizes this into print 1. It does not care about overflows because it knows that you, as a programmer, are not allowed to trigger signed overflows. And if you do nonetheless, then it, as a compiler, has no obligation whatsoever to generate correct code. So it works under the assumption that signed overflows don't happen, and it doesn't care the least about what could happen if that assumption turns out to be wrong.

Now, I suggest you try this program:

#include <stdio.h>
#include <limits.h>

int main(void)
{
    // Loop until `x' overflows, at which point it becomes negative.
    for (long int x = LONG_MAX - 10; x >= 0; x++)
        printf("%ld\n", x);
}

Compile it with no optimization. Then run it. Then progressively increase the optimization level until you get the surprise.

To fix, properly, it would make the most sense to make something like a "clock" object, that could wrap, or not wrap.

That would be a bad idea, for two reasons:

You need it to wrap. It must wrap in order to give the correct result across a micros() rollover. Implementing a non-wrapping version would be pointless.
The C++ language already has a built-in type that is guaranteed to always wrap the way you want. It's called unsigned long. There is no point in reinventing the wheel.

In my case, I needed the speed

Your signed long is not faster than an unsigned long. Actually, the subtraction is likely to generate the exact same machine code whether the numbers are signed or not. To be more precise, I would say that if you are so unlucky that your code does exactly what you hoped it should do, then it is more than likely the compiler generated the same machine code it would have generated with an unsigned long. Doing the right thing carries no speed penalty.

OK, this comment is getting too long. I did not expect such a simple bug report to degenerate into me educating you about undefined behavior. The takeaways are:

this is a genuine bug
the fix is trivial and it carries no performance penalty.

Do you want me to submit a pull request for this?

Regards,

Edgar.

JonHub commented 6 years ago

Hi Edgar,

I too responded quickly to your email yesterday, and did not mean to upset you.

I'll take a look through your emails more thoroughly, my response was off the cuff. (I've been working crazy long hours, and wanted to at least get back to you ... otherwise, email tends to disappear in my in box.)

I don't actually do micro-controller code right now, and don't have this setup. (Probably in storage somewhere.) If you could help checking the fix, that would be great, but lemme look into it in more detail and get back to you.

Thanks, J

edgar-bonet commented 6 years ago

Hi!

I'm not upset. Sorry if I sounded like so. And I must admit that the whole issue of signed overflow being undefined behavior is not that obvious to everybody. Actually, I just found found research on that specific subject that concludes that “there is widespread misunderstanding of the (highly complex) language rules for integer operations in C/C++, even among expert programmers.”

Take your time. I will post a pull request, as I expect it should make things easier for you.

Best regards,

Edgar.

JonHub commented 6 years ago

Thank you!

Yes, communicating over email can be weird - from my end, I certainly didn't mean to be defensive, and appreciate the help! [It was for a project, where I did, well, basically EVERYTHING, including run the company, design the electronics, and write the code ... I confirmed the rollover was well-behaved on that platform, and moved on. If the code is actually useful to people, and only has this one issue, that makes me happy!]

Work is calming down some (crazy presentation last Monday, which is why you got the quick email), and I'll read your emails more thoroughly ... sounds like an easy fix, and at the least, should be documented.

Hopefully I can take a look this weekend.

Are you using the filters, and finding them useful? I wrote the code to solve some practical problems, and really like the "easy, real-time DSP" ... you lights that fade are SO much nicer than lights that blink.

The real world is not discontinuous, and I find it jarring how many electronics blink / flash. I especially like the two-pole cascaded RC, that has a really smooth ramp up ... you can use it, like, when a light goes on or off.

Also ... I don't think it is in the current code, but I was taking the linear expansion of

exp(-x) ≈ -x

which works for small time updates, and is MUCH less processor intensive, especially on an 8-bit system.

It's that's useful, we / I could put it back into the code ...

Best, J

J

On Wed, Feb 14, 2018 at 5:31 AM, Edgar Bonet notifications@github.com wrote:

Hi!

I'm not upset. Sorry if I sounded like so. And I must admit that the whole issue of signed overflow being undefined behavior is not that obvious to everybody. Actually, I just found found research http://www.cs.utah.edu/%7Eregehr/papers/tosem15.pdf#page=26 on that specific subject that concludes that “there is widespread misunderstanding of the (highly complex) language rules for integer operations in C/C++, even among expert programmers.”

Take your time. I will post a pull request, as I expect it should make things easier for you.

Best regards,

Edgar.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/JonHub/Filters/issues/3#issuecomment-365607638, or mute the thread https://github.com/notifications/unsubscribe-auth/AFue1gRlo46gbqrQlQLhDhUiSfqC5ZdNks5tUuBNgaJpZM4SCNp2 .

edgar-bonet commented 6 years ago

Hi!

Are you using the filters, and finding them useful?

No, I landed here while trying to help someone who was having timing issues with FilterOnePole in HIGHPASS mode 1. C.f. issue #4.

exp(-x) ≈ -x

You mean exp(−x) ≈ 1 − x.

Yes, that seems to me a very natural thing to do. Here, x = δt/τ, where δt is the time elapsed since the last step and τ is the filter's time constant. When using the filter in the “normal” way, δt is significantly smaller than τ, and this first order approximation is perfectly reasonable.

However, I don't know whether all your users are aware that choosing τ smaller than δt doesn't make much sense. You may want to add a safety like

exp(−δt/τ) ≈ max(1 − δt/τ, 0)

to ensure that the filter is stable even if misused. Another option could be

exp(−δt/τ) ≈ τ/(τ+δt)

This is also a first order approximation (the error scales like (δt/τ)²), but it has the correct limit on δt/τ → ∞.

There is another approximation that seems to be popular for transforming continuous-time filters to discrete time: the bilinear transform 2. It is based on an approximation of the exponential that can be written

exp(−δt/τ) ≈ (2τ−δt)/(2τ+δt)

This one is second order (the error is O((δt/τ)³)). It has the wrong limit on δt/τ → ∞, and this will make the filter oscillate if δt > 2τ. However, the oscillations are always damped, and the filter is unconditionally stable.

Note also that on AVR the division is not cheap. It is therefore more efficient to write

ampFactor = (2*TauUS - ElapsedUS)/(2*TauUS + ElapsedUS);

which has a single division, rather than

TauSamps = TauUS / ElapsedUS;
ampFactor = (2 - 1/TauSamps)/(2 + 1/TauSamps);

which has three divisions. That's the reason why I wrote the approximations above in terms of δt and τ, rather than in terms of x = δt/τ.

Regards,

Edgar.

ElectricRCAircraftGuy commented 6 years ago

+1 for bilinear transform and other optimizations, such as the division one

ElectricRCAircraftGuy commented 6 years ago

+1 on the pull request. It's an easy fix and a definite bug otherwise.

JonHub commented 6 years ago

Just wanted to say, thanks for the recent work, it looks like there is some activity, and others getting involved as well ...

Haven't had time to look through your emails (etc.) yet, will do that soon.

J

On Wed, Feb 21, 2018 at 6:49 PM, Gabriel Staples notifications@github.com wrote:

+1 on the pull request https://github.com/JonHub/Filters/pull/5. It's an easy fix and a definite bug otherwise.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/JonHub/Filters/issues/3#issuecomment-367550230, or mute the thread https://github.com/notifications/unsubscribe-auth/AFue1jATdDPaKxxtxxJQDpSKdshFVNBfks5tXNWzgaJpZM4SCNp2 .

edgar-bonet commented 6 years ago

Hi!

Any progress on this issue?

Regards,

Edgar.

Shafaeit commented 4 years ago

Working well but TM1637 Module not working with this Library because TM1637 Some Delay on it, How can it adjust time on filter library

edgar-bonet commented 4 years ago

@Shafaeit: Your comment seems out of topic. It has nothing to do with the present issue.

JonHub / Filters

Undefined behavior on timer rollover #3

loop through, printing out: