Closed smcpeak closed 2 years ago
I agree that your change is more readable, but I do not think it actually has any real impact on the operation at all (and Clang agrees). I also don't see where the UB is at in this operation.
uintptr_t
is sized to be an integer that can hold the value of a pointer, and the value held by a
will not be any larger than what can be held in a char *
. a
is simply the start address of s
, and the value returned by __strchrnul
is an address that is an offset of s
(that is greater than or equal to s
/a
), the resulting offset value is not outside the range of the string object pointed to by s
.__strchrnul
is a char*
, it will be an equivalent mathematical operation as a cast to uintptr_t
and then performing the mathematical operation (would not be true of other pointer types). If the resulting value is out of range, that will still be true with a uintptr_t
conversion.char *
) to an integral type (size_t
here... who knows why I cast to a uintptr_t
instead of a size_t
back then, but that particular implicit conversion works out fine).Code:
return __strchrnul(s, *c) - (uintptr_t)a;
Warning:
[1/8] Compiling C object src/libc_native.a.p/string_strcspn.c.o
../src/string/strcspn.c:15:10: warning: incompatible pointer to integer conversion returning 'char *' from a function with result type 'size_t' (aka 'unsigned long') [-Wint-conversion]
return __strchrnul(s, *c) - (uintptr_t)a;
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
1 warning generated.
[2/8] Compiling C object src/libc.a.p/string_strcspn.c.o
../src/string/strcspn.c:15:10: warning: incompatible pointer to integer conversion returning 'char *' from a function with result type 'size_t' (aka 'unsigned long') [-Wint-conversion]
return __strchrnul(s, *c) - (uintptr_t)a;
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
1 warning generated.
[3/8] Compiling C object src/libc_hosted.a.p/string_strcspn.c.o
../src/string/strcspn.c:15:10: warning: incompatible pointer to integer conversion returning 'char *' from a function with result type 'size_t' (aka 'unsigned long') [-Wint-conversion]
return __strchrnul(s, *c) - (uintptr_t)a;
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
1 warning generated.
[8/8] Linking target test/libc_test
The standard states that the result of pointer math with an integer type is a pointer, and since the value returned by
__strchrnul
is achar*
, it will be an equivalent mathematical operation as a cast touintptr_t
and then performing the mathematical operation (would not be true of other pointer types).
The numeric result will be the same, yes, but if the result has the type of a pointer, then it must point within the same object as the original pointer (before the arithmetic). Quoting C11 6.5.6/8:
When an expression that has integer type is added to or subtracted from a pointer, the result has the type of the pointer operand. If the pointer operand points to an element of an array object, and the array is large enough, the result points to an element offset from the original element such that the difference of the subscripts of the resulting and original array elements equals the integer expression. [...] If both the pointer operand and the result point to elements of the same array object, or one past the last element of the array object, the evaluation shall not produce an overflow; otherwise, the behavior is undefined. [...]
Note the final "behavior is undefined" in the case that the pointer operand (original) and result do not both point to the same array object (or one-past-the-end).
If the resulting value is out of range, that will still be true with a
uintptr_t
conversion.
True, but uintptr_t
is allowed to store arbitrary (representable) integers, while pointers are technically not.
I agree that the line:
return __strchrnul(s, *c) - (uintptr_t)a;
is incorrect, as it has both the UB due to out of bounds arithmetic and then implicitly casts that pointer to size_t
(the return type). What I suggested above is:
return (uintptr_t)__strchrnul(s, *c) - (uintptr_t)a;
which performs the arithmetic on two uintptr_t
quantities, then implicitly converts that to size_t
.
However, that's based on my speculation about the original intent. If this were my code, I would actually write:
return (size_t)(__strchrnul(s, *c) - a);
That way, the arithmetic is performed on the original pointers (which both point to the same array, so this complies with C11 6.5.6/9), resulting in a ptrdiff_t
, which is then explicitly converted to size_t
(justified since the result will always be non-negative). I consider this slightly better than doing the arithmetic on uintptr_t
since there are C implementations where uintptr_t
is larger than size_t
and ptrdiff_t
, thus making arithmetic on the former potentially more expensive.
Thank you for taking the time to provide the detailed explanation, I agree with your conclusion.
The latest PR looks good to me, thanks!
The
strcspn
function insrc/string/strcspn.c
contains a line of code (introduced in commit eb00919a88) that has undefined behavior due to pointer arithmetic going out of bounds:I believe the intent here is to convert both pointers to
uintptr_t
and then subtract the resulting integers. But the way the parentheses are placed, it instead converts onlya
to an integer, then subtracts that from the pointer returned by__strchrnul
. Since(uintptr_t)a
will in general be large, the resulting pointer is well outside the range of the object pointed to bys
, which results in undefined behavior (per C11 6.5.6/8).In practice, this has no ill effect on any C implementation I'm familiar with, but should be fixed anyway. I think the following is what was intended: