strcspn: Undefined behavior due to invalid pointer arithmetic

smcpeak commented 2 years ago

The strcspn function in src/string/strcspn.c contains a line of code (introduced in commit eb00919a88) that has undefined behavior due to pointer arithmetic going out of bounds:

  return (uintptr_t)(__strchrnul(s, *c) - (uintptr_t)a);

I believe the intent here is to convert both pointers to uintptr_t and then subtract the resulting integers. But the way the parentheses are placed, it instead converts only a to an integer, then subtracts that from the pointer returned by __strchrnul. Since (uintptr_t)a will in general be large, the resulting pointer is well outside the range of the object pointed to by s, which results in undefined behavior (per C11 6.5.6/8).

In practice, this has no ill effect on any C implementation I'm familiar with, but should be fixed anyway. I think the following is what was intended:

  return (uintptr_t)__strchrnul(s, *c) - (uintptr_t)a;

phillipjohnston commented 2 years ago

I agree that your change is more readable, but I do not think it actually has any real impact on the operation at all (and Clang agrees). I also don't see where the UB is at in this operation.

A uintptr_t is sized to be an integer that can hold the value of a pointer, and the value held by a will not be any larger than what can be held in a char *.
Since a is simply the start address of s, and the value returned by __strchrnul is an address that is an offset of s (that is greater than or equal to s/a), the resulting offset value is not outside the range of the string object pointed to by s.
The standard states that the result of pointer math with an integer type is a pointer, and since the value returned by __strchrnul is a char*, it will be an equivalent mathematical operation as a cast to uintptr_t and then performing the mathematical operation (would not be true of other pointer types). If the resulting value is out of range, that will still be true with a uintptr_t conversion.
The reason for the cast placement as it was is the original clang warning (reported below for reference) is actually complaining about the implicit conversion from a pointer (char *) to an integral type (size_t here... who knows why I cast to a uintptr_t instead of a size_t back then, but that particular implicit conversion works out fine).

Code:

return __strchrnul(s, *c) - (uintptr_t)a;

Warning:

[1/8] Compiling C object src/libc_native.a.p/string_strcspn.c.o
../src/string/strcspn.c:15:10: warning: incompatible pointer to integer conversion returning 'char *' from a function with result type 'size_t' (aka 'unsigned long') [-Wint-conversion]
                return __strchrnul(s, *c) - (uintptr_t)a;
                       ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
1 warning generated.
[2/8] Compiling C object src/libc.a.p/string_strcspn.c.o
../src/string/strcspn.c:15:10: warning: incompatible pointer to integer conversion returning 'char *' from a function with result type 'size_t' (aka 'unsigned long') [-Wint-conversion]
                return __strchrnul(s, *c) - (uintptr_t)a;
                       ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
1 warning generated.
[3/8] Compiling C object src/libc_hosted.a.p/string_strcspn.c.o
../src/string/strcspn.c:15:10: warning: incompatible pointer to integer conversion returning 'char *' from a function with result type 'size_t' (aka 'unsigned long') [-Wint-conversion]
                return __strchrnul(s, *c) - (uintptr_t)a;
                       ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
1 warning generated.
[8/8] Linking target test/libc_test

smcpeak commented 2 years ago

The standard states that the result of pointer math with an integer type is a pointer, and since the value returned by __strchrnul is a char*, it will be an equivalent mathematical operation as a cast to uintptr_t and then performing the mathematical operation (would not be true of other pointer types).

The numeric result will be the same, yes, but if the result has the type of a pointer, then it must point within the same object as the original pointer (before the arithmetic). Quoting C11 6.5.6/8:

When an expression that has integer type is added to or subtracted from a pointer, the result has the type of the pointer operand. If the pointer operand points to an element of an array object, and the array is large enough, the result points to an element offset from the original element such that the difference of the subscripts of the resulting and original array elements equals the integer expression. [...] If both the pointer operand and the result point to elements of the same array object, or one past the last element of the array object, the evaluation shall not produce an overflow; otherwise, the behavior is undefined. [...]

Note the final "behavior is undefined" in the case that the pointer operand (original) and result do not both point to the same array object (or one-past-the-end).

If the resulting value is out of range, that will still be true with a uintptr_t conversion.

True, but uintptr_t is allowed to store arbitrary (representable) integers, while pointers are technically not.

I agree that the line:

return __strchrnul(s, *c) - (uintptr_t)a;

is incorrect, as it has both the UB due to out of bounds arithmetic and then implicitly casts that pointer to size_t (the return type). What I suggested above is:

return (uintptr_t)__strchrnul(s, *c) - (uintptr_t)a;

which performs the arithmetic on two uintptr_t quantities, then implicitly converts that to size_t.

However, that's based on my speculation about the original intent. If this were my code, I would actually write:

return (size_t)(__strchrnul(s, *c) - a);

That way, the arithmetic is performed on the original pointers (which both point to the same array, so this complies with C11 6.5.6/9), resulting in a ptrdiff_t, which is then explicitly converted to size_t (justified since the result will always be non-negative). I consider this slightly better than doing the arithmetic on uintptr_t since there are C implementations where uintptr_t is larger than size_t and ptrdiff_t, thus making arithmetic on the former potentially more expensive.

phillipjohnston commented 2 years ago

Thank you for taking the time to provide the detailed explanation, I agree with your conclusion.

smcpeak commented 2 years ago

The latest PR looks good to me, thanks!

embeddedartistry / libc

strcspn: Undefined behavior due to invalid pointer arithmetic #177