Open florian-pe opened 2 years ago
Here is some additional numerology.
We get the entire string as long as the Most Significant Bit of a pointer size value is not set
$ perl -E 'use bignum; say substr("hello", 0, 2**63-3)'
hello
$ perl -E 'use bignum; say substr("hello", 0, 2**63-2)'
hello
$ perl -E 'use bignum; say substr("hello", 0, 2**63-1)'
hello
$ perl -E 'use bignum; say substr("hello", 0, 2**63)'
$ perl -E 'use bignum; say substr("hello", 0, 2**63+1)'
then no value, as long as the MSB is set
$ perl -E 'use bignum; say substr("hello", 0, 2**63)'
$ perl -E 'use bignum; say substr("hello", 0, 2**63+1)'
$ perl -E 'use bignum; say substr("hello", 0, 2**63+2**32)'
$ perl -E 'use bignum; say substr("hello", 0, 2**63+2**62)'
Then we start getting some of the string again, probably following the overflow, which should unset the MSB
$ perl -E 'use bignum; say substr("hello", 0, 2**63+2**63)'
hell
$ perl -E 'use bignum; say substr("hello", 0, 2**64)'
hell
$ perl -E 'use bignum; say substr("hello", 0, 2**64+1)'
hell
$ perl -E 'use bignum; say substr("hello", 0, 2**64+2)'
hell
But then it behaves as if the (whichever it is) sign bit is set ...
$ perl -E 'use bignum; say substr("hello", 0, 2**64)'
hell
$ perl -E 'use bignum; say substr("hello", 0, 2**64-1)'
hell
$ perl -E 'use bignum; say substr("hello", 0, 2**64-2)'
hel
$ perl -E 'use bignum; say substr("hello", 0, 2**64-3)'
he
$ perl -E 'use bignum; say substr("hello", 0, 2**64-4)'
h
$ perl -E 'use bignum; say substr("hello", 0, 2**64-5)'
$ perl -E 'use bignum; say substr("hello", 0, 2**64-6)'
... because it behaves the same way as this sequence of negative numbers
$ perl -E 'use bignum; say substr("hello", 0, -0)'
$ perl -E 'use bignum; say substr("hello", 0, -1)'
hell
$ perl -E 'use bignum; say substr("hello", 0, -2)'
hel
$ perl -E 'use bignum; say substr("hello", 0, -3)'
he
$ perl -E 'use bignum; say substr("hello", 0, -4)'
h
$ perl -E 'use bignum; say substr("hello", 0, -5)'
$ perl -E 'use bignum; say substr("hello", 0, -6)'
Personally I don't really think this is a bug in substr().
This is an example where perls flexibility with numeric types produces surprising results. Lets look at what perl thinks of that first number:
$ perl -MDevel::Peek -le'Dump(18446744073709551615)' SV = IV(0x55c42a6896d0) at 0x55c42a6896e0 REFCNT = 1 FLAGS = (IOK,READONLY,PROTECT,pIOK,IsUV) UV = 18446744073709551615
This value happens the decimal representation of UV_MAX, eg, 2**64-1. It is the highest value perl alone can represent as a true integer type. Add one and you get this:
$ perl -MDevel::Peek -le'Dump(18446744073709551616)' SV = NV(0x559585d816c8) at 0x559585d816e0 REFCNT = 1 FLAGS = (NOK,READONLY,PROTECT,pNOK) NV = 1.84467440737096e+19
Eg, it has been converted to an NV, a double, eg, floating point, which best represents the original integer.
The internals logic uses the macro SvIV() to get a signed representation of the argument, it then checks to see if this IV is actually a UV. As you can see it is a UV so substr does not think it is a negative number. With the NV case the NV is converted to an IV and the "is UV" flag is not set, and it turns into -1. If there is a bug I guess it would be here.
The second set of output you provided relates to bignum, and I believe you see a similar set of effects, the exact details I am not sure of but it wouldnt surprise me if a bignum turns into a float which is then converted to an IV and we see the same issue as above.
This would occur in just about any part of our API's where we internally cast data into a UV/IV, so if we need to address it (it feels like a "well dont do that") then we should address it at the numeric layer, not the substr() layer. I dont know about the details of NV -> IV/UV conversion, it feels wrong that sign changes in the above cases.
Yves
(it feels like a "well dont do that")
but then what should someone with a string of 16 exibyte (2 ** 60
) do?! :-)
I can see consistency in the non-bignum examples.
The behaviour is consistent with the OFFSET/LENGTH
argument being evaluated as -1 if it exceeds UV_MAX.
This is reproducible in the following Inline::C script:
use warnings;
use Config;
use Inline C => Config =>
BUILD_NOISY => 1, # else any compilation warnings are hidden
;
use Inline C => <<'EOC';
SV * foo(SV * x) {
if(SvUOK(x)) return newSVuv(SvUV(x));
return newSViv(SvIV(x));
}
EOC
for(18446744073709551615, 3 . "0" x 18, 18446744073709551616, 3 . "0" x 19, 3 . "0" x 200) {
print foo($_), "\n";
}
__END__
Outputs:
18446744073709551615
3000000000000000000
-1
-1
-1
To me, that provides credence to the behaviour. I have less than zero interest in what happens when bignum gets involved. (I don't mean to denigrate "bignum" ... this is just something I do in an attempt to preserve my sanity.)
Cheers, Rob
Above a certain value, the output of
substr EXPR,OFFSET
is the last character of the stringAnd above the same value, the output of
substr EXPR,OFFSET,LENGTH
chop off the last character of the string