embeddedartistry / libc

libc targeted for embedded systems usage. Reduced set of functionality (due to embedded nature). Chosen for portability and quick bringup.
MIT License
524 stars 67 forks source link

strcspn: Undefined behavior due to invalid pointer arithmetic #177

Closed smcpeak closed 2 years ago

smcpeak commented 2 years ago

The strcspn function in src/string/strcspn.c contains a line of code (introduced in commit eb00919a88) that has undefined behavior due to pointer arithmetic going out of bounds:

  return (uintptr_t)(__strchrnul(s, *c) - (uintptr_t)a);

I believe the intent here is to convert both pointers to uintptr_t and then subtract the resulting integers. But the way the parentheses are placed, it instead converts only a to an integer, then subtracts that from the pointer returned by __strchrnul. Since (uintptr_t)a will in general be large, the resulting pointer is well outside the range of the object pointed to by s, which results in undefined behavior (per C11 6.5.6/8).

In practice, this has no ill effect on any C implementation I'm familiar with, but should be fixed anyway. I think the following is what was intended:

  return (uintptr_t)__strchrnul(s, *c) - (uintptr_t)a;
phillipjohnston commented 2 years ago

I agree that your change is more readable, but I do not think it actually has any real impact on the operation at all (and Clang agrees). I also don't see where the UB is at in this operation.

Code:

return __strchrnul(s, *c) - (uintptr_t)a;

Warning:

[1/8] Compiling C object src/libc_native.a.p/string_strcspn.c.o
../src/string/strcspn.c:15:10: warning: incompatible pointer to integer conversion returning 'char *' from a function with result type 'size_t' (aka 'unsigned long') [-Wint-conversion]
                return __strchrnul(s, *c) - (uintptr_t)a;
                       ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
1 warning generated.
[2/8] Compiling C object src/libc.a.p/string_strcspn.c.o
../src/string/strcspn.c:15:10: warning: incompatible pointer to integer conversion returning 'char *' from a function with result type 'size_t' (aka 'unsigned long') [-Wint-conversion]
                return __strchrnul(s, *c) - (uintptr_t)a;
                       ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
1 warning generated.
[3/8] Compiling C object src/libc_hosted.a.p/string_strcspn.c.o
../src/string/strcspn.c:15:10: warning: incompatible pointer to integer conversion returning 'char *' from a function with result type 'size_t' (aka 'unsigned long') [-Wint-conversion]
                return __strchrnul(s, *c) - (uintptr_t)a;
                       ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
1 warning generated.
[8/8] Linking target test/libc_test
smcpeak commented 2 years ago

The standard states that the result of pointer math with an integer type is a pointer, and since the value returned by __strchrnul is a char*, it will be an equivalent mathematical operation as a cast to uintptr_t and then performing the mathematical operation (would not be true of other pointer types).

The numeric result will be the same, yes, but if the result has the type of a pointer, then it must point within the same object as the original pointer (before the arithmetic). Quoting C11 6.5.6/8:

When an expression that has integer type is added to or subtracted from a pointer, the result has the type of the pointer operand. If the pointer operand points to an element of an array object, and the array is large enough, the result points to an element offset from the original element such that the difference of the subscripts of the resulting and original array elements equals the integer expression. [...] If both the pointer operand and the result point to elements of the same array object, or one past the last element of the array object, the evaluation shall not produce an overflow; otherwise, the behavior is undefined. [...]

Note the final "behavior is undefined" in the case that the pointer operand (original) and result do not both point to the same array object (or one-past-the-end).

If the resulting value is out of range, that will still be true with a uintptr_t conversion.

True, but uintptr_t is allowed to store arbitrary (representable) integers, while pointers are technically not.


I agree that the line:

return __strchrnul(s, *c) - (uintptr_t)a;

is incorrect, as it has both the UB due to out of bounds arithmetic and then implicitly casts that pointer to size_t (the return type). What I suggested above is:

return (uintptr_t)__strchrnul(s, *c) - (uintptr_t)a;

which performs the arithmetic on two uintptr_t quantities, then implicitly converts that to size_t.

However, that's based on my speculation about the original intent. If this were my code, I would actually write:

return (size_t)(__strchrnul(s, *c) - a);

That way, the arithmetic is performed on the original pointers (which both point to the same array, so this complies with C11 6.5.6/9), resulting in a ptrdiff_t, which is then explicitly converted to size_t (justified since the result will always be non-negative). I consider this slightly better than doing the arithmetic on uintptr_t since there are C implementations where uintptr_t is larger than size_t and ptrdiff_t, thus making arithmetic on the former potentially more expensive.

phillipjohnston commented 2 years ago

Thank you for taking the time to provide the detailed explanation, I agree with your conclusion.

smcpeak commented 2 years ago

The latest PR looks good to me, thanks!