Systematically avoiding overflow for integer operations

cole-miller commented 1 year ago

@MathieuBordere suggested that we should have some code (probably macros?) in dqlite/raft to perform arithmetic without silent wrapping/overflow, and I agree. One of the crashes I found with afl (#465) can be triggered by an overflow in the multiplication here, e.g.:

https://github.com/canonical/dqlite/blob/e618e62985de2f8d584ef2902977b2ed00ce9d19/src/conn.c#L165

freeekanayaka commented 1 year ago

In the specific case that is mentioned, I think that the value of c->request.words should be checked at the time the request received by the server from the client over the wire gets decoded, basically right after this line:

https://github.com/canonical/dqlite/blob/e618e62985de2f8d584ef2902977b2ed00ce9d19/src/conn.c#L194

The other fields of c->request that are decoded by message__decode should be checked too in that spot, so that subsequent code and rely on them to be sane.

Validating the values that the client sends us over the wire is something we probably don't always do, so it'd be nice to check that too.

I'm not entirely sure what macros can do in a case like the mentioned one, what would the macro do?

cole-miller commented 1 year ago

My thinking was that we could have a set of macros like

#define CHECKED_ADD(TY, A, B) \
    { \
        /* If the (infinite-precision) result of adding A to B fits in the type TY, return that value; otherwise, abort/return an error code. */ \
    }

(Or, more likely, with an int return value and out parameter for the result to allow more flexibility for what happens on overflow/wrapping.) It would be a function rather than a macro to handle a variety of input/output types, but we could also just write monomorphic functions for particular sets of types as needed. Then in a case like the one I linked we'd use CHECKED_MUL instead of a regular multiplication, erroring out if overflow/wrapping is detected.

Rather than reinventing the wheel, we could use the GCC/Clang builtins that exist for this purpose:

https://gcc.gnu.org/onlinedocs/gcc/Integer-Overflow-Builtins.html https://clang.llvm.org/docs/LanguageExtensions.html#checked-arithmetic-builtins

(There was some kind of proposal to standardize these in C23, but it seems not to have made it into the final cut.)

freeekanayaka commented 1 year ago

The drawback I see is that it's difficult to understand what went wrong, because the check does not happen when the user provides the input, but at some point later down the road, so it'd be difficult to produce a meaningful error message (e.g. "invalid message words count: %d", to stick with the mentioned example).

Given that input should be ideally validated, I'd suggest to do check all our assumptions as soon as user input is received. After that point we can safely assume that those assumptions are met (e.g. the words count is not greater than X). Note that there might be other types of issues related to invalid user input that don't fall under the overflow category, so it feels to me that anyway the generic solution would be to validate the input. That should cover the overflow problem and any other ones where we make some sort of assumption on the input.

cole-miller commented 1 year ago

Validating user input makes sense to me, but I worry about cases where we can't foresee exactly what validity properties are needed at the point of receiving input. For example, it wasn't immediately obvious to me when reviewing #459 that some multiplications at type size_t couldn't overflowm, where the quantities being multiplied are not related in any remotely transparent way to user input. Mathieu convinced me that those particular multiplications are fine, but if we could reach for __builtin_mul_overflow it would be clear from the code that if overflow did occur we wouldn't continue operating in an unsafe state.

In general, I agree with doing validation up-front where that's practical, but I'm strongly in favor of making liberal use of checked arithmetic for (1) cases where early validation just isn't practical, and (2) even where we can and do validate the inputs, to provide a second line of defense in the event of faulty reasoning on our part, refactorings that introduce unanticipated states, etc. With unchecked arithmetic, the worst case for those kinds of subtle bugs is that we introduce UB; with checked arithmetic, the worst case is that we get a predictable crash that at least points to a call site.

freeekanayaka commented 1 year ago

Validating user input makes sense to me, but I worry about cases where we can't foresee exactly what validity properties are needed at the point of receiving input. For example, it wasn't immediately obvious to me when reviewing #459 that some multiplications at type size_t couldn't overflowm, where the quantities being multiplied are not related in any remotely transparent way to user input. Mathieu convinced me that those particular multiplications are fine, but if we could reach for __builtin_mul_overflow it would be clear from the code that if overflow did occur we wouldn't continue operating in an unsafe state.

In general, I agree with doing validation up-front where that's practical, but I'm strongly in favor of making liberal use of checked arithmetic for (1) cases where early validation just isn't practical, and (2) even where we can and do validate the inputs, to provide a second line of defense in the event of faulty reasoning on our part, refactorings that introduce unanticipated states, etc. With unchecked arithmetic, the worst case for those kinds of subtle bugs is that we introduce UB; with checked arithmetic, the worst case is that we get a predictable crash that at least points to a call site.

I see your point. To me it sounds like the check should be some kind of assertion (possibly with a macro if that's more convenient), which makes the program abort, in the same vein of all the assertions we have all around the code. Especially at the beginning of functions we place a number of assertions to make sure that the assumptions we make locally about current state or function arguments actually hold, precisely because the reasons you mention, it's hard sometimes to reason through all code and it helps against regressions in refactorings. It could be as easy as placing a couple of assertions before a multiplication, to validate that the terms of the multiplications meet the expectations and assumptions.

Our testing code (unit tests all the way up to jenkins) should be good enough to trigger those assertions in case we screw up something.

Note that in some sense these are not genuine failure modes for which we should really return errors, they are more "bugs" in the code, e.g. we didn't perform validation or we broke some assumptions or similar things. Those are really software defects that should be fixed rather than bumped to the user.

FWIW SQLite uses this kind of technique very extensively (via assertions indeed). There's also an interesting take about assertions here: https://www.sqlite.org/assert.html, which I find quite sensible.

freeekanayaka commented 1 year ago

Note also that placing individual assertions about the individual arguments of a multiplication just before the multiplication takes place helps in identifying what exactly is wrong, as opposed to just checking the result of the multiplication, in which case you don't know what factor is to blame. This is probably irrelevant for the specific case that was outlined above, for the words count (since one of the two factors is a constant), but it might be true in general.

canonical / dqlite

Systematically avoiding overflow for integer operations #467