Perl / perl5

🐪 The Perl programming language
https://dev.perl.org/perl5/
Other
1.93k stars 551 forks source link

grok_atou() for regexp quantifiers #14498

Closed p5pRT closed 9 years ago

p5pRT commented 9 years ago

Migrated from rt.perl.org#123814 (status was 'resolved')

Searchable as RT123814$

p5pRT commented 9 years ago

From @hvds

AFL finds this:

% ./miniperl -e '"z" =~ /x|y{01,}/'
miniperl: regexec.c:6443: S_regmatch: Assertion `st->u.curly.min <= st->u.curly.max' failed.
Aborted (core dumped)
% 

This is a symptom of a bigger problem: regpiece() decides if the regexp fragment about to be parsed is a quantifier by calling regcurly(), which returns TRUE if it matches /\{\d+,?\d*\}/; if so, it then uses grok_atou to parse the numbers out a) assuming it will always succeed, and b) ignoring any range issues.

In particular, that means if a number matches /^0\d/ it'll return MAX_UV, which is cast to I32 for initial checks (which is how the above test got past the normal min < max check), but then truncated to unsigned 16 bits (though REG_INFTY assumes signed 16 bits):

% ./miniperl -Dr -e 'qr/x{01,}/' 2>&1 | grep CURLY
   1: CURLY {65535,32767} (5)
% 

.. and if a number is valid but out of range it'll be similarly truncated:

% ./miniperl -Dr -e 'qr/x{7777777777}/' 2>&1 | grep CURLY
   1: CURLY {30833,30833} (5)
% 

Fixing this is made harder by the fact that, although there were long-term plans to change it, we currently treat anything nearly but not quite a valid quantifier as literal instead - changing regcurly() to return FALSE for these cases would mean we'd be silently changing the interpretation of existing regexps caught by this to something completely different.

Better would be to raise an error, or at least a warning; but then this becomes the one exception to the "treat as literal if not valid" approach, likely leading to more confusion. Also, regcurly() is called from several places in toke.c and regcomp.c, so it may be tricky to get them to act consistently.

So I'm not sure what to do here.

Hugo

p5pRT commented 9 years ago

From @demerphq

On 13 February 2015 at 19​:43\, Hugo van der Sanden \perlbug\-followup@&#8203;perl\.org wrote​:

# New Ticket Created by Hugo van der Sanden # Please include the string​: [perl #123814] # in the subject line of all future correspondence about this issue. # \<URL​: https://rt-archive.perl.org/perl5/Ticket/Display.html?id=123814 >

AFL (\<http​://lcamtuf.coredump.cx/afl>) finds this​:

% ./miniperl -e '"z" =~ /x|y{01\,}/' miniperl​: regexec.c​:6443​: S_regmatch​: Assertion `st->u.curly.min \<= st->u.curly.max' failed. Aborted (core dumped) %

This is a symptom of a bigger problem​: regpiece() decides if the regexp fragment about to be parsed is a quantifier by calling regcurly()\, which returns TRUE if it matches /{\d+\,?\d*\/; if so\, it then uses grok_atou to parse the numbers out a) assuming it will always succeed\, and b) ignoring any range issues.

In particular\, that means if a number matches /^0\d/ it'll return MAX_UV\, which is cast to I32 for initial checks (which is how the above test got past the normal min \< max check)\, but then truncated to unsigned 16 bits (though REG_INFTY assumes signed 16 bits)​:

Am i understanding this to mean that grok_atou will turn any "number" starting with 0 that is not 0 itself into MAX_UV?

That sounds less than awesome. It also makes me wonder if people really are using this in the wild?

Why dont we just ignore the leading zeros?

Yves

p5pRT commented 9 years ago

The RT System itself - Status changed from 'new' to 'open'

p5pRT commented 9 years ago

From @hvds

On Sun Feb 15 21​:37​:46 2015\, demerphq wrote​:

Am i understanding this to mean that grok_atou will turn any "number" starting with 0 that is not 0 itself into MAX_UV?

Yes\, that's what it is documented to do. But it's almost as if not everybody that uses it fully reads its documentation before using it.

That sounds less than awesome. It also makes me wonder if people really are using this in the wild?

Why dont we just ignore the leading zeros?

Presumably because that gives equally surprising results\, whether you end up treating 010 as 10 or 8.

There are inputs that it considers errors\, and it has a mechanism to report those errors; however a high proportion of callers are ignoring that.

I think the whole interface of it is encouraging bugs. I think we need something more like the grok_bslash_x style - to return a boolean success and use pointer arguments for the resulting values - since it becomes a bit more obvious if the caller isn't checking that the parse was successful\, particularly in combination with -Wunused-result.

I think we'd also benefit from specific functions (or macros) for each of the different integer types/sizes we might target\, so that the range checks happen in just one place and we reduce the problem of blind casting.

Ignoring XS/APItest and code duplicataion in ext/re\, I found some 29 callers; a quick glance at those suggest around 25 of them are dubious.

In the meantime\, we might want to consider the doc patch below.

Hugo

--- a/numeric.c +++ b/numeric.c @​@​ -1035\,7 +1035\,7 @​@​ Perl_grok_number_flags(pTHX_ const char *pv\, STRLEN len\, U /* =for apidoc grok_atou

-grok_atou is a safer replacement for atoi and strtol. +grok_atou is a replacement for atoi and strtol.

grok_atou parses a C-style zero-byte terminated string\, looking for a decimal unsigned integer. (END)

p5pRT commented 9 years ago

From @jhi

NOW would be the time to fix the API of grok_atou.

It was invented some time last summer by yours truly as a less insane replacement for atoi() (which has a bunch of its own problems\, and which the core was using left and right).

Why NOW? Because it has been only in 5.21.x\, *and* only used by the core. So before it escapes the lab\, please fix it as you see fit.

Off-hand\, maybe it should have a special-special case of /^0+/ becoming zero. Maybe. As Hugo pointed out\, the ugly octal case rears its 010 head. Or maybe make it return boolean - in which case it cannot be an atoi drop-in-replacement any more.

The design directive of grok_atou was "accept only strictly valid decimal positive integers\, reject anything else". "000000" fails that. But maybe the rejection should be signaled by the primary return type.

In which case it might as well be removed\, no need to tinker with its docs. More importantly\, no need to keep multiplying APIs. We have already enough of them.

But PLEASE\, fix it. Or kill it. (I would but I lack the tuits right now\, in the fast oncoming 5.22 track)

p5pRT commented 9 years ago

From @jhi

... it should be possible to try reverting the commit(s) where grok_atou was introduced\, and we would be back to using native atoi().

p5pRT commented 9 years ago

From @khwilliamson

On Wed Feb 18 18​:57​:43 2015\, jhi wrote​:

... it should be possible to try reverting the commit(s) where grok_atou was introduced\, and we would be back to using native atoi().

But atoi has more bugs. Why can't we just take this out of the public API? We can rename it to begin with an underscore and mark it as experimental\, removing the docs\, so that tghere is only discouragement for people using it\, and the docs say you do so at your own risk

-- Karl Williamson

p5pRT commented 9 years ago

From @demerphq

On 19 February 2015 at 12​:59\, Karl Williamson via RT \perlbug\-followup@&#8203;perl\.org wrote​:

On Wed Feb 18 18​:57​:43 2015\, jhi wrote​:

... it should be possible to try reverting the commit(s) where grok_atou was introduced\, and we would be back to using native atoi().

But atoi has more bugs. Why can't we just take this out of the public API? We can rename it to begin with an underscore and mark it as experimental\, removing the docs\, so that tghere is only discouragement for people using it\, and the docs say you do so at your own risk

Lets not *remove* the docs. Lets update them appropriately. :-)

Yves

-- perl -Mre=debug -e "/just|another|perl|hacker/"

p5pRT commented 9 years ago

From @hvds

On Wed Feb 18 18​:45​:47 2015\, jhi wrote​:

NOW would be the time to fix the API of grok_atou. [...] Why NOW? Because it has been only in 5.21.x

Oh\, I hadn't noticed that it was so recent; in that case\, strongly agreed.

Off-hand\, maybe it should have a special-special case of /^0+/ becoming zero. Maybe. As Hugo pointed out\, the ugly octal case rears its 010 head. Or maybe make it return boolean - in which case it cannot be an atoi drop-in-replacement any more.

Given we've already killed atoi uses\, I don't think it's a big problem that it would no longer be a drop-in replacement.

I'll have a go at making a version that retains the strictness but returns boolean success\, with target-type-specific variants. In the meantime I've pushed the branch hv/grok with what I think is the appropriate short-term fix for the primary issue in this ticket - I'd also like comments on whether we're happy to add a new failure message to cover /x{00}/ and /x{999999999999999999999999}/.

Hugo

p5pRT commented 9 years ago

From @jhi

To the tune of "everybody should take out their own garbage"\, attached is a patch that would remove grok_atou()\, and leave the atoi() (and strtol\, etc.) in their naked glory. But this state of things does not pass e.g. the tests added in

  [perl #123782] regcomp​: check for overflow on /(?123)/

aka http​://perl5.git.perl.org/perl.git/commit/b3725d49f914ef2bed63d7eb92a72ef6e886b489

But one can take a look at the patch to see where the atoi() spots are.

p5pRT commented 9 years ago

From @jhi

0001-Remove-the-grok_atou-experiment.patch ```diff From aab1827fa2d4ea91caf7dd83d7a86a546e9e1a9e Mon Sep 17 00:00:00 2001 From: Jarkko Hietaniemi Date: Wed, 18 Feb 2015 22:56:03 -0500 Subject: [PATCH] Remove the grok_atou experiment Reverts e05c5d0845ee6e5d8fe7218314dde23c083265cc 945b524a8cdacbf82557d751252e6546f48d21ae 75feedba47600d94d18d49dbcbdf46393b6c6cc5 d62b8c6ab04885d004d4a52686f68518d49a3604 f43791029d4ac4a5dbfd6ad9b67cb5407ac32e2a fdadaf77ca45094e35ce724d7c91001f84b083c7 26065e6c9b1e28a59935bef1cddb98278a829253 68419f9c61ef7c22b1225655e7e3b38058c70a71 338aa8b061f430c2b3d9deaeed0aec523639aff7 999448781bc711c8732271b98a45a724f7357c46 c98823ffa61e4daf92a7a17ab937753b2c280c13 a7941017b561ee4cf4e5f4ac4ebb6c9e684303ed dd52de8081680731af4e00f224c756ed5c3a510f 96e440d2eb546f4493feffce002f2ec8886f13a3 6313e54401f5531a23184b7afaaf6bc7cd4a81ec Some manual tweakage was necessary so it's possible some intervening change was missed, but no trace of grok_atou should remain. --- doio.c | 2 +- embed.fnc | 1 - embed.h | 1 - ext/DynaLoader/dlutils.c | 2 +- ext/XS-APItest/numeric.xs | 27 --------- ext/XS-APItest/t/grok.t | 144 ---------------------------------------------- gv.c | 2 +- locale.c | 2 +- malloc.c | 4 +- mg.c | 12 ++-- numeric.c | 96 ++----------------------------- perl.c | 24 +++----- pod/perlclib.pod | 13 ++--- pod/perlhacktips.pod | 16 ------ pp_sys.c | 2 +- proto.h | 5 -- regcomp.c | 54 +++++++---------- t/porting/libperl.t | 13 +---- toke.c | 2 +- utf8.c | 17 +++--- util.c | 22 +++---- 21 files changed, 67 insertions(+), 394 deletions(-) diff --git a/doio.c b/doio.c index a63f2a2..bae481a 100644 --- a/doio.c +++ b/doio.c @@ -391,7 +391,7 @@ Perl_do_open6(pTHX_ GV *gv, const char *oname, STRLEN len, num_svs = 0; } else if (isDIGIT(*type)) { - wanted_fd = grok_atou(type, NULL); + wanted_fd = atoi(type); } else { const IO* thatio; diff --git a/embed.fnc b/embed.fnc index cfe634f..0eb8d83 100644 --- a/embed.fnc +++ b/embed.fnc @@ -818,7 +818,6 @@ Apd |int |grok_number |NN const char *pv|STRLEN len|NULLOK UV *valuep Apd |int |grok_number_flags|NN const char *pv|STRLEN len|NULLOK UV *valuep|U32 flags ApdR |bool |grok_numeric_radix|NN const char **sp|NN const char *send Apd |UV |grok_oct |NN const char* start|NN STRLEN* len_p|NN I32* flags|NULLOK NV *result -Apdn |UV |grok_atou |NN const char* pv|NULLOK const char** endptr : These are all indirectly referenced by globals.c. This is somewhat annoying. p |int |magic_clearenv |NN SV* sv|NN MAGIC* mg p |int |magic_clear_all_env|NN SV* sv|NN MAGIC* mg diff --git a/embed.h b/embed.h index 802b624..8ccf48a 100644 --- a/embed.h +++ b/embed.h @@ -179,7 +179,6 @@ #define getcwd_sv(a) Perl_getcwd_sv(aTHX_ a) #define gp_free(a) Perl_gp_free(aTHX_ a) #define gp_ref(a) Perl_gp_ref(aTHX_ a) -#define grok_atou Perl_grok_atou #define grok_bin(a,b,c,d) Perl_grok_bin(aTHX_ a,b,c,d) #define grok_hex(a,b,c,d) Perl_grok_hex(aTHX_ a,b,c,d) #define grok_infnan(a,b) Perl_grok_infnan(aTHX_ a,b) diff --git a/ext/DynaLoader/dlutils.c b/ext/DynaLoader/dlutils.c index 96ea8be..bb8b975 100644 --- a/ext/DynaLoader/dlutils.c +++ b/ext/DynaLoader/dlutils.c @@ -116,7 +116,7 @@ dl_generic_private_init(pTHX) /* called by dl_*.xs dl_private_init() */ #if defined(PERL_IN_DL_HPUX_XS) || defined(PERL_IN_DL_DLOPEN_XS) if ( (perl_dl_nonlazy = getenv("PERL_DL_NONLAZY")) != NULL ) - dl_nonlazy = grok_atou(perl_dl_nonlazy, NULL); + dl_nonlazy = atoi(perl_dl_nonlazy); else dl_nonlazy = 0; if (dl_nonlazy) diff --git a/ext/XS-APItest/numeric.xs b/ext/XS-APItest/numeric.xs index 6d1ef82..ab48dba 100644 --- a/ext/XS-APItest/numeric.xs +++ b/ext/XS-APItest/numeric.xs @@ -30,30 +30,3 @@ grok_number_flags(number, flags) PUSHs(sv_2mortal(newSViv(result))); if (result & IS_NUMBER_IN_UV) PUSHs(sv_2mortal(newSVuv(value))); - -void -grok_atou(number, endsv) - SV *number - SV *endsv - PREINIT: - STRLEN len; - const char *pv = SvPV(number, len); - UV result; - const char* endptr; - PPCODE: - EXTEND(SP,2); - if (endsv == &PL_sv_undef) { - result = grok_atou(pv, NULL); - } else { - result = grok_atou(pv, &endptr); - } - PUSHs(sv_2mortal(newSVuv(result))); - if (endsv == &PL_sv_undef) { - PUSHs(sv_2mortal(newSVpvn(NULL, 0))); - } else { - if (endptr) { - PUSHs(sv_2mortal(newSViv(endptr - pv))); - } else { - PUSHs(sv_2mortal(newSViv(0))); - } - } diff --git a/ext/XS-APItest/t/grok.t b/ext/XS-APItest/t/grok.t index f66717b..b184d59 100644 --- a/ext/XS-APItest/t/grok.t +++ b/ext/XS-APItest/t/grok.t @@ -109,148 +109,4 @@ for my $grok (@groks) { is($out_flags, $grok->[3], "'$grok->[0]' flags $grok->[1] - check flags"); } -my $ATOU_MAX = ~0; - -# atou tests -my @atous = - ( - # [ input, endsv, out uv, out len ] - - # Basic cases. - [ "0", "", 0, 1 ], - [ "1", "", 1, 1 ], - [ "2", "", 2, 1 ], - [ "9", "", 9, 1 ], - [ "12", "", 12, 2 ], - [ "123", "", 123, 3 ], - - # Trailing whitespace is accepted or rejected, depending on endptr. - [ "0 ", " ", 0, 1 ], - [ "1 ", " ", 1, 1 ], - [ "2 ", " ", 2, 1 ], - [ "12 ", " ", 12, 2 ], - - # Trailing garbage is accepted or rejected, depending on endptr. - [ "0x", "x", 0, 1 ], - [ "1x", "x", 1, 1 ], - [ "2x", "x", 2, 1 ], - [ "12x", "x", 12, 2 ], - - # Leading whitespace is failure. - [ " 0", " 0", 0, 0 ], - [ " 1", " 1", 0, 0 ], - [ " 12", " 12", 0, 0 ], - - # Leading garbage is outright failure. - [ "x0", "x0", 0, 0 ], - [ "x1", "x1", 0, 0 ], - [ "x12", "x12", 0, 0 ], - - # We do not parse decimal point. - [ "12.3", ".3", 12, 2 ], - - # Leading pluses or minuses are no good. - [ "+12", "+12", 0, 0 ], - [ "-12", "-12", 0, 0 ], - - # Extra leading zeros cause overflow. - [ "00", "00", $ATOU_MAX, 0 ], - [ "01", "01", $ATOU_MAX, 0 ], - [ "012", "012", $ATOU_MAX, 0 ], - ); - -# Values near overflow point. -if ($Config{uvsize} == 8) { - push @atous, - ( - # 32-bit values no problem for 64-bit. - [ "4294967293", "", 4294967293, 10, ], - [ "4294967294", "", 4294967294, 10, ], - [ "4294967295", "", 4294967295, 10, ], - [ "4294967296", "", 4294967296, 10, ], - [ "4294967297", "", 4294967297, 10, ], - - # This is well within 64-bit. - [ "9999999999", "", 9999999999, 10, ], - - # Values valid up to 64-bit and beyond. - [ "18446744073709551613", "", 18446744073709551613, 20, ], - [ "18446744073709551614", "", 18446744073709551614, 20, ], - [ "18446744073709551615", "", $ATOU_MAX, 20, ], - [ "18446744073709551616", "", $ATOU_MAX, 0, ], - [ "18446744073709551617", "", $ATOU_MAX, 0, ], - ); -} elsif ($Config{uvsize} == 4) { - push @atous, - ( - # Values valid up to 32-bit and beyond. - [ "4294967293", "", 4294967293, 10, ], - [ "4294967294", "", 4294967294, 10, ], - [ "4294967295", "", $ATOU_MAX, 10, ], - [ "4294967296", "", $ATOU_MAX, 0, ], - [ "4294967297", "", $ATOU_MAX, 0, ], - - # Still beyond 32-bit. - [ "4999999999", "", $ATOU_MAX, 0, ], - [ "5678901234", "", $ATOU_MAX, 0, ], - [ "6789012345", "", $ATOU_MAX, 0, ], - [ "7890123456", "", $ATOU_MAX, 0, ], - [ "8901234567", "", $ATOU_MAX, 0, ], - [ "9012345678", "", $ATOU_MAX, 0, ], - [ "9999999999", "", $ATOU_MAX, 0, ], - [ "10000000000", "", $ATOU_MAX, 0, ], - [ "12345678901", "", $ATOU_MAX, 0, ], - - # 64-bit values are way beyond. - [ "18446744073709551613", "", $ATOU_MAX, 0, ], - [ "18446744073709551614", "", $ATOU_MAX, 0, ], - [ "18446744073709551615", "", $ATOU_MAX, 0, ], - [ "18446744073709551616", "", $ATOU_MAX, 0, ], - [ "18446744073709551617", "", $ATOU_MAX, 0, ], - ); -} - -# These will fail to fail once 128/256-bit systems arrive. -push @atous, - ( - [ "23456789012345678901", "", $ATOU_MAX, 0 ], - [ "34567890123456789012", "", $ATOU_MAX, 0 ], - [ "98765432109876543210", "", $ATOU_MAX, 0 ], - [ "98765432109876543211", "", $ATOU_MAX, 0 ], - [ "99999999999999999999", "", $ATOU_MAX, 0 ], - ); - -for my $grok (@atous) { - my $input = $grok->[0]; - my $endsv = $grok->[1]; - - my ($out_uv, $out_len); - - # First with endsv. - ($out_uv, $out_len) = grok_atou($input, $endsv); - is($out_uv, $grok->[2], - "'$input' $endsv - number success (got $out_uv cf $grok->[2])"); - ok($grok->[3] <= length $input, "'$input' $endsv - length sanity 1"); - unless (length $grok->[1]) { - is($out_len, $grok->[3], "'$input' $endsv - length sanity 2"); - } # else { ... } ? - if ($out_len) { - is($endsv, substr($input, $out_len), - "'$input' $endsv - length sanity 3"); - } - - # Then without endsv (undef == NULL). - ($out_uv, $out_len) = grok_atou($input, undef); - if (length $grok->[1]) { - if ($grok->[2] == $ATOU_MAX) { - is($out_uv, $ATOU_MAX, "'$input' undef - number overflow"); - } else { - is($out_uv, 0, "'$input' undef - number zero"); - } - } else { - is($out_uv, $grok->[2], - "'$input' undef - number success (got $out_uv cf $grok->[2])"); - } -} - done_testing(); diff --git a/gv.c b/gv.c index 41cebeb..fed0c10 100644 --- a/gv.c +++ b/gv.c @@ -1990,7 +1990,7 @@ S_gv_magicalize(pTHX_ GV *gv, HV *stash, const char *name, STRLEN len, if (!isDIGIT(*end)) return addmg; } - paren = grok_atou(name, NULL); + paren = strtoul(name, NULL, 10); goto storeparen; } } diff --git a/locale.c b/locale.c index 6c62c1f..00481dd 100644 --- a/locale.c +++ b/locale.c @@ -675,7 +675,7 @@ Perl_init_i18nl10n(pTHX_ int printwarn) const bool locwarn = (printwarn > 1 || (printwarn && (! bad_lang_use_once - || grok_atou(bad_lang_use_once, NULL)))); + || atoi(bad_lang_use_once)))); bool done = FALSE; #ifdef WIN32 /* In some systems you can find out the system default locale diff --git a/malloc.c b/malloc.c index 58bec64..dacb134 100644 --- a/malloc.c +++ b/malloc.c @@ -1824,7 +1824,7 @@ Perl_mfree(Malloc_t where) if (bad_free_warn == -1) { dTHX; char *pbf = PerlEnv_getenv("PERL_BADFREE"); - bad_free_warn = (pbf) ? grok_atou(pbf, NULL) : 1; + bad_free_warn = (pbf) ? atoi(pbf) : 1; } if (!bad_free_warn) return; @@ -1922,7 +1922,7 @@ Perl_realloc(void *mp, size_t nbytes) if (bad_free_warn == -1) { dTHX; char *pbf = PerlEnv_getenv("PERL_BADFREE"); - bad_free_warn = (pbf) ? grok_atou(pbf, NULL) : 1; + bad_free_warn = (pbf) ? atoi(pbf) : 1; } if (!bad_free_warn) return NULL; diff --git a/mg.c b/mg.c index d2a8db0..36ad1d7 100644 --- a/mg.c +++ b/mg.c @@ -3019,7 +3019,6 @@ Perl_magic_set(pTHX_ SV *sv, MAGIC *mg) { const char *p = SvPV_const(sv, len); Groups_t *gary = NULL; - const char* endptr; #ifdef _SC_NGROUPS_MAX int maxgrp = sysconf(_SC_NGROUPS_MAX); @@ -3031,20 +3030,19 @@ Perl_magic_set(pTHX_ SV *sv, MAGIC *mg) while (isSPACE(*p)) ++p; - new_egid = (Gid_t)grok_atou(p, &endptr); + new_egid = (Gid_t)Atol(p); for (i = 0; i < maxgrp; ++i) { - if (endptr == NULL) - break; - p = endptr; + while (*p && !isSPACE(*p)) + ++p; while (isSPACE(*p)) ++p; if (!*p) break; - if (!gary) + if(!gary) Newx(gary, i + 1, Groups_t); else Renew(gary, i + 1, Groups_t); - gary[i] = (Groups_t)grok_atou(p, &endptr); + gary[i] = (Groups_t)Atol(p); } if (i) PERL_UNUSED_RESULT(setgroups(i, gary)); diff --git a/numeric.c b/numeric.c index a6f6018..065b238 100644 --- a/numeric.c +++ b/numeric.c @@ -840,15 +840,14 @@ Perl_grok_number(pTHX_ const char *pv, STRLEN len, UV *valuep) return grok_number_flags(pv, len, valuep, 0); } -static const UV uv_max_div_10 = UV_MAX / 10; -static const U8 uv_max_mod_10 = UV_MAX % 10; - int Perl_grok_number_flags(pTHX_ const char *pv, STRLEN len, UV *valuep, U32 flags) { const char *s = pv; const char * const send = pv + len; const char *d; + const UV max_div_10 = UV_MAX / 10; + const char max_mod_10 = UV_MAX % 10; int numtype = 0; PERL_ARGS_ASSERT_GROK_NUMBER_FLAGS; @@ -918,9 +917,9 @@ Perl_grok_number_flags(pTHX_ const char *pv, STRLEN len, UV *valuep, U32 flags) each time for overflow. */ digit = *s - '0'; while (digit >= 0 && digit <= 9 - && (value < uv_max_div_10 - || (value == uv_max_div_10 - && digit <= uv_max_mod_10))) { + && (value < max_div_10 + || (value == max_div_10 + && digit <= max_mod_10))) { value = value * 10 + digit; if (++s < send) digit = *s - '0'; @@ -1032,91 +1031,6 @@ Perl_grok_number_flags(pTHX_ const char *pv, STRLEN len, UV *valuep, U32 flags) return 0; } -/* -=for apidoc grok_atou - -grok_atou is a safer replacement for atoi and strtol. - -grok_atou parses a C-style zero-byte terminated string, looking for -a decimal unsigned integer. - -Returns the unsigned integer, if a valid value can be parsed -from the beginning of the string. - -Accepts only the decimal digits '0'..'9'. - -As opposed to atoi or strtol, grok_atou does NOT allow optional -leading whitespace, or negative inputs. If such features are -required, the calling code needs to explicitly implement those. - -If a valid value cannot be parsed, returns either zero (if non-digits -are met before any digits) or UV_MAX (if the value overflows). - -Note that extraneous leading zeros also count as an overflow -(meaning that only "0" is the zero). - -On failure, the *endptr is also set to NULL, unless endptr is NULL. - -Trailing non-digit bytes are allowed if the endptr is non-NULL. -On return the *endptr will contain the pointer to the first non-digit byte. - -If the endptr is NULL, the first non-digit byte MUST be -the zero byte terminating the pv, or zero will be returned. - -Background: atoi has severe problems with illegal inputs, it cannot be -used for incremental parsing, and therefore should be avoided -atoi and strtol are also affected by locale settings, which can also be -seen as a bug (global state controlled by user environment). - -=cut -*/ - -UV -Perl_grok_atou(const char *pv, const char** endptr) -{ - const char* s = pv; - const char** eptr; - const char* end2; /* Used in case endptr is NULL. */ - UV val = 0; /* The return value. */ - - PERL_ARGS_ASSERT_GROK_ATOU; - - eptr = endptr ? endptr : &end2; - if (isDIGIT(*s)) { - /* Single-digit inputs are quite common. */ - val = *s++ - '0'; - if (isDIGIT(*s)) { - /* Extra leading zeros cause overflow. */ - if (val == 0) { - *eptr = NULL; - return UV_MAX; - } - while (isDIGIT(*s)) { - /* This could be unrolled like in grok_number(), but - * the expected uses of this are not speed-needy, and - * unlikely to need full 64-bitness. */ - U8 digit = *s++ - '0'; - if (val < uv_max_div_10 || - (val == uv_max_div_10 && digit <= uv_max_mod_10)) { - val = val * 10 + digit; - } else { - *eptr = NULL; - return UV_MAX; - } - } - } - } - if (s == pv) { - *eptr = NULL; /* If no progress, failed to parse anything. */ - return 0; - } - if (endptr == NULL && *s) { - return 0; /* If endptr is NULL, no trailing non-digits allowed. */ - } - *eptr = s; - return val; -} - #ifndef USE_QUADMATH STATIC NV S_mulexp10(NV value, I32 exponent) diff --git a/perl.c b/perl.c index cda99ff..940c4a9 100644 --- a/perl.c +++ b/perl.c @@ -545,12 +545,7 @@ perl_destruct(pTHXx) { const char * const s = PerlEnv_getenv("PERL_DESTRUCT_LEVEL"); if (s) { - int i; - if (strEQ(s, "-1")) { /* Special case: modperl folklore. */ - i = -1; - } else { - i = grok_atou(s, NULL); - } + const int i = atoi(s); #ifdef DEBUGGING if (destruct_level < i) destruct_level = i; #endif @@ -1465,7 +1460,7 @@ perl_parse(pTHXx_ XSINIT_t xsinit, int argc, char **argv, char **env) { const char * const s = PerlEnv_getenv("PERL_HASH_SEED_DEBUG"); - if (s && (grok_atou(s, NULL) == 1)) { + if (s && (atoi(s) == 1)) { unsigned char *seed= PERL_HASH_SEED; unsigned char *seed_end= PERL_HASH_SEED + PERL_HASH_SEED_BYTES; PerlIO_printf(Perl_debug_log, "HASH_FUNCTION = %s HASH_SEED = 0x", PERL_HASH_FUNC); @@ -2300,8 +2295,8 @@ S_parse_body(pTHX_ char **env, XSINIT_t xsinit) #ifdef MYMALLOC { const char *s; - if ((s=PerlEnv_getenv("PERL_DEBUG_MSTATS")) && grok_atou(s, NULL) >= 2) - dump_mstats("after compilation:"); + if ((s=PerlEnv_getenv("PERL_DEBUG_MSTATS")) && atoi(s) >= 2) + dump_mstats("after compilation:"); } #endif @@ -3052,10 +3047,7 @@ Perl_get_debug_opts(pTHX_ const char **s, bool givehelp) } } else if (isDIGIT(**s)) { - const char* e; - i = grok_atou(*s, &e); - if (e) - *s = e; + i = atoi(*s); for (; isWORDCHAR(**s); (*s)++) ; } else if (givehelp) { @@ -3658,9 +3650,9 @@ S_open_script(pTHX_ const char *scriptname, bool dosearch, bool *suidscript) if (strnEQ(scriptname, "/dev/fd/", 8) && isDIGIT(scriptname[8]) ) { const char *s = scriptname + 8; - const char* e; - fdscript = grok_atou(s, &e); - s = e; + fdscript = atoi(s); + while (isDIGIT(*s)) + s++; if (*s) { /* PSz 18 Feb 04 * Tell apart "normal" usage of fdscript, e.g. diff --git a/pod/perlclib.pod b/pod/perlclib.pod index c5fb455..cd958b1 100644 --- a/pod/perlclib.pod +++ b/pod/perlclib.pod @@ -203,19 +203,14 @@ C, as described in L.) Instead Of: Use: atof(s) Atof(s) - atoi(s) grok_atou(s, &e) - atol(s) grok_atou(s, &e) + atol(s) Atol(s) strtod(s, &p) Nothing. Just don't use it. - strtol(s, &p, n) grok_atou(s, &e) - strtoul(s, &p, n) grok_atou(s, &e) + strtol(s, &p, n) Strtol(s, &p, n) + strtoul(s, &p, n) Strtoul(s, &p, n) Notice also the C, C, and C functions in F for converting strings representing numbers in the respective -bases into Cs. Note that grok_atou() doesn't handle negative inputs, -or leading whitespace (being purposefully strict). - -Note that strtol() and strtoul() may be disguised as Strtol(), Strtoul(), -Atol(), Atoul(). Avoid those, too. +bases into Cs. In theory C and C may not be defined if the machine perl is built on doesn't actually have strtol and strtoul. But as those 2 diff --git a/pod/perlhacktips.pod b/pod/perlhacktips.pod index 943bdfb..94d282f 100644 --- a/pod/perlhacktips.pod +++ b/pod/perlhacktips.pod @@ -637,22 +637,6 @@ of the program is UTF-8. What happens is that the C<%s> and its operand are simply skipped without any notice. L. -=item * - -Do not use atoi() - -Use grok_atou() instead. atoi() has ill-defined behavior on overflows, -and cannot be used for incremental parsing. It is also affected by locale, -which is bad. - -=item * - -Do not use strtol() or strtoul() - -Use grok_atou() instead. strtol() or strtoul() (or their IV/UV-friendly -macro disguises, Strtol() and Strtoul(), or Atol() and Atoul() are -affected by locale, which is bad. - =back =head1 DEBUGGING diff --git a/pp_sys.c b/pp_sys.c index e2f8edf..5519ff8 100644 --- a/pp_sys.c +++ b/pp_sys.c @@ -3356,7 +3356,7 @@ PP(pp_fttty) if (GvIO(gv) && IoIFP(GvIOp(gv))) fd = PerlIO_fileno(IoIFP(GvIOp(gv))); else if (name && isDIGIT(*name)) - fd = grok_atou(name, NULL); + fd = atoi(name); else FT_RETURNUNDEF; if (fd < 0) { diff --git a/proto.h b/proto.h index 966c6d8..2ce9a40 100644 --- a/proto.h +++ b/proto.h @@ -1340,11 +1340,6 @@ PERL_CALLCONV int Perl_getcwd_sv(pTHX_ SV* sv) PERL_CALLCONV void Perl_gp_free(pTHX_ GV* gv); PERL_CALLCONV GP* Perl_gp_ref(pTHX_ GP* gp); -PERL_CALLCONV UV Perl_grok_atou(const char* pv, const char** endptr) - __attribute__nonnull__(1); -#define PERL_ARGS_ASSERT_GROK_ATOU \ - assert(pv) - PERL_CALLCONV UV Perl_grok_bin(pTHX_ const char* start, STRLEN* len_p, I32* flags, NV *result) __attribute__nonnull__(pTHX_1) __attribute__nonnull__(pTHX_2) diff --git a/regcomp.c b/regcomp.c index 68f6bfd..1833b74 100644 --- a/regcomp.c +++ b/regcomp.c @@ -9902,7 +9902,6 @@ S_reg(pTHX_ RExC_state_t *pRExC_state, I32 paren, I32 *flagp,U32 depth) else if (*RExC_parse == '?') { /* (?...) */ bool is_logical = 0; const char * const seqstart = RExC_parse; - const char * endptr; if (has_intervening_patws) { RExC_parse++; vFAIL("In '(?...)', the '(' and '?' must be adjacent"); @@ -10116,23 +10115,12 @@ S_reg(pTHX_ RExC_state_t *pRExC_state, I32 paren, I32 *flagp,U32 depth) case '5': case '6': case '7': case '8': case '9': RExC_parse--; parse_recursion: - { - bool is_neg = FALSE; - UV unum; - parse_start = RExC_parse - 1; /* MJD */ - if (*RExC_parse == '-') { - RExC_parse++; - is_neg = TRUE; - } - unum = grok_atou(RExC_parse, &endptr); - num = (unum > I32_MAX) ? I32_MAX : (I32)unum; - if (endptr) - RExC_parse = (char*)endptr; - if (is_neg) { - /* Some limit for num? */ - num = -num; - } - } + num = atoi(RExC_parse); + parse_start = RExC_parse - 1; /* MJD */ + if (*RExC_parse == '-') + RExC_parse++; + while (isDIGIT(*RExC_parse)) + RExC_parse++; if (*RExC_parse!=')') vFAIL("Expecting close bracket"); @@ -10311,9 +10299,9 @@ S_reg(pTHX_ RExC_state_t *pRExC_state, I32 paren, I32 *flagp,U32 depth) RExC_parse++; parno = 0; if (RExC_parse[0] >= '1' && RExC_parse[0] <= '9' ) { - parno = grok_atou(RExC_parse, &endptr); - if (endptr) - RExC_parse = (char*)endptr; + parno = atoi(RExC_parse++); + while (isDIGIT(*RExC_parse)) + RExC_parse++; } else if (RExC_parse[0] == '&') { SV *sv_dat; RExC_parse++; @@ -10330,9 +10318,9 @@ S_reg(pTHX_ RExC_state_t *pRExC_state, I32 paren, I32 *flagp,U32 depth) /* (?(1)...) */ char c; char *tmp; - parno = grok_atou(RExC_parse, &endptr); - if (endptr) - RExC_parse = (char*)endptr; + parno = atoi(RExC_parse++); + while (isDIGIT(*RExC_parse)) + RExC_parse++; ret = reganode(pRExC_state, GROUPP, parno); insert_if_check_paren: @@ -10812,16 +10800,15 @@ S_regpiece(pTHX_ RExC_state_t *pRExC_state, I32 *flagp, U32 depth) next++; } if (*next == '}') { /* got one */ - const char* endptr; if (!maxpos) maxpos = next; RExC_parse++; - min = grok_atou(RExC_parse, &endptr); + min = atoi(RExC_parse); if (*maxpos == ',') maxpos++; else maxpos = RExC_parse; - max = grok_atou(maxpos, &endptr); + max = atoi(maxpos); if (!max && *maxpos != '0') max = REG_INFTY; /* meaning "infinity" */ else if (max >= REG_INFTY) @@ -11512,17 +11499,18 @@ S_alloc_maybe_populate_EXACT(pTHX_ RExC_state_t *pRExC_state, } -/* Parse backref decimal value, unless it's too big to sensibly be a backref, +/* return atoi(p), unless it's too big to sensibly be a backref, * in which case return I32_MAX (rather than possibly 32-bit wrapping) */ static I32 S_backref_value(char *p) { - const char* endptr; - UV val = grok_atou(p, &endptr); - if (endptr == p || endptr == NULL || val > I32_MAX) - return I32_MAX; - return (I32)val; + char *q = p; + + for (;isDIGIT(*q); q++) {} /* calculate length of num */ + if (q - p == 0 || q - p > 9) + return I32_MAX; + return atoi(p); } diff --git a/t/porting/libperl.t b/t/porting/libperl.t index d97b332..d719663 100644 --- a/t/porting/libperl.t +++ b/t/porting/libperl.t @@ -482,11 +482,7 @@ for my $symbol (sort keys %expected) { # (One exception: for certain floating point outputs # the native sprintf is still used in some platforms, see below.) # -# atoi has unsafe and undefined failure modes, and is affected by locale. -# Its cousins include atol and atoll. -# -# strtol and strtoul are affected by locale. -# Cousins include strtoq. +# XXX: add atoi() to %unexpected - unsafe and undefined failure modes. # # system should not be used, use pp_system or my_popen. # @@ -504,13 +500,6 @@ for my $str (qw(strcat strcpy strncat strncpy)) { $unexpected{$str} = undef; # No Configure symbol for these. } -$unexpected{atoi} = undef; # No Configure symbol for atoi. -$unexpected{atol} = undef; # No Configure symbol for atol. - -for my $str (qw(atoll strtol strtoul strtoq)) { - $unexpected{$str} = "d_$str"; -} - for my $symbol (sort keys %unexpected) { if (defined $unexpected{$symbol} && !$Config{$unexpected{$symbol}}) { SKIP: { diff --git a/toke.c b/toke.c index 9e0575c..63dd482 100644 --- a/toke.c +++ b/toke.c @@ -1698,7 +1698,7 @@ S_incline(pTHX_ const char *s) if (*e != '\n' && *e != '\0') return; /* false alarm */ - line_num = grok_atou(n, &e) - 1; + line_num = atoi(n)-1; if (t - s > 0) { const STRLEN len = t - s; diff --git a/utf8.c b/utf8.c index 179a969..e279e13 100644 --- a/utf8.c +++ b/utf8.c @@ -3544,24 +3544,22 @@ Perl__swash_to_invlist(pTHX_ SV* const swash) lend = l + lcur; if (*l == 'V') { /* Inversion list format */ - const char *after_atou = (char *) lend; + char *after_strtol = (char *) lend; UV element0; UV* other_elements_ptr; /* The first number is a count of the rest */ l++; - elements = grok_atou((const char *)l, &after_atou); + elements = Strtoul((char *)l, &after_strtol, 10); if (elements == 0) { invlist = _new_invlist(0); } else { - while (isSPACE(*l)) l++; - l = (U8 *) after_atou; + l = (U8 *) after_strtol; /* Get the 0th element, which is needed to setup the inversion list */ - while (isSPACE(*l)) l++; - element0 = (UV) grok_atou((const char *)l, &after_atou); - l = (U8 *) after_atou; + element0 = (UV) Strtoul((char *)l, &after_strtol, 10); + l = (U8 *) after_strtol; invlist = _setup_canned_invlist(elements, element0, &other_elements_ptr); elements--; @@ -3570,9 +3568,8 @@ Perl__swash_to_invlist(pTHX_ SV* const swash) if (l > lend) { Perl_croak(aTHX_ "panic: Expecting %"UVuf" more elements than available", elements); } - while (isSPACE(*l)) l++; - *other_elements_ptr++ = (UV) grok_atou((const char *)l, &after_atou); - l = (U8 *) after_atou; + *other_elements_ptr++ = (UV) Strtoul((char *)l, &after_strtol, 10); + l = (U8 *) after_strtol; } } } diff --git a/util.c b/util.c index 9ffdbde..23a683c 100644 --- a/util.c +++ b/util.c @@ -1378,7 +1378,7 @@ Perl_mess_sv(pTHX_ SV *basemsg, bool consume) int wi; /* The PERL_C_BACKTRACE_ON_WARN must be an integer of one or more. */ if ((ws = PerlEnv_getenv("PERL_C_BACKTRACE_ON_ERROR")) && - (wi = grok_atou(ws, NULL)) > 0) { + (wi = atoi(ws)) > 0) { Perl_dump_c_backtrace(aTHX_ Perl_debug_log, wi, 1); } } @@ -4419,9 +4419,9 @@ Perl_parse_unicode_opts(pTHX_ const char **popt) if (*p) { if (isDIGIT(*p)) { - const char* endptr; - opt = (U32) grok_atou(p, &endptr); - p = endptr; + opt = (U32) atoi(p); + while (isDIGIT(*p)) + p++; if (*p && *p != '\n' && *p != '\r') { if(isSPACE(*p)) goto the_end_of_the_opts_parser; else @@ -4736,7 +4736,7 @@ Perl_free_global_struct(pTHX_ struct perl_vars *plvarsp) * The default implementation reads a single env var, PERL_MEM_LOG, * expecting one or more of the following: * - * \d+ - fd fd to write to : must be 1st (grok_atou) + * \d+ - fd fd to write to : must be 1st (atoi) * 'm' - memlog was PERL_MEM_LOG=1 * 's' - svlog was PERL_SV_LOG=1 * 't' - timestamp was PERL_MEM_LOG_TIMESTAMP=1 @@ -4804,8 +4804,7 @@ S_mem_log_common(enum mem_log_type mlt, const UV n, * timeval. */ { STRLEN len; - const char* endptr; - int fd = grok_atou(pmlenv, &endptr); /* Ignore endptr. */ + int fd = atoi(pmlenv); if (!fd) fd = PERL_MEM_LOG_FD; @@ -5991,7 +5990,7 @@ static void atos_update(atos_context* ctx, /* Given an output buffer end |p| and its |start|, matches * for the atos output, extracting the source code location - * and returning non-NULL if possible, returning NULL otherwise. */ + * if possible, returning NULL otherwise. */ static const char* atos_parse(const char* p, const char* start, STRLEN* source_name_size, @@ -6006,14 +6005,11 @@ static const char* atos_parse(const char* p, * The matched regular expression is roughly "\(.*:\d+\)\s*$" */ const char* source_number_start; const char* source_name_end; - const char* source_line_end; - const char* close_paren; /* Skip trailing whitespace. */ while (p > start && isspace(*p)) p--; /* Now we should be at the close paren. */ if (p == start || *p != ')') return NULL; - close_paren = p; p--; /* Now we should be in the line number. */ if (p == start || !isdigit(*p)) @@ -6034,9 +6030,7 @@ static const char* atos_parse(const char* p, return NULL; p++; *source_name_size = source_name_end - p; - *source_line = grok_atou(source_number_start, &source_line_end); - if (source_line_end != close_paren) - return NULL; + *source_line = atoi(source_number_start); return p; } -- 2.3.0 ```
p5pRT commented 9 years ago

From @demerphq

On 19 February 2015 at 22​:33\, Jarkko Hietaniemi via RT \perlbug\-followup@&#8203;perl\.org wrote​:

To the tune of "everybody should take out their own garbage"\, attached is a patch that would remove grok_atou()\, and leave the atoi() (and strtol\, etc.) in their naked glory. But this state of things does not pass e.g. the tests added in

\[perl \#123782\] regcomp&#8203;: check for overflow on /\(?123\)/

aka http​://perl5.git.perl.org/perl.git/commit/b3725d49f914ef2bed63d7eb92a72ef6e886b489

But one can take a look at the patch to see where the atoi() spots are.

My inclination is to go with Hugo's option. I dont think reverting this stuff is ideal. But i think at this point it is up to Hugo.

cheers\, Yves

p5pRT commented 9 years ago

From @jhi

My inclination is to go with Hugo's option. I dont think reverting

I agree. The main point of my patch was to clearly show where and how grok_atou changed things\, since I saw to my horror that instead of being one or few clean commits\, it was quite a few more\, not cleanly separable.

this stuff is ideal. But i think at this point it is up to Hugo.

-- There is this special biologist word we use for 'stable'. It is 'dead'. -- Jack Cohen

p5pRT commented 9 years ago

From @hvds

I've repushed branch hv/grok with a second commit as a proof of concept. There's a few loose ends\, but should be enough to see how much you like or hate it - passes a basic test run except for XS-APItest\, for which I'll need to acquire some XS fu.

Hugo

p5pRT commented 9 years ago

From @khwilliamson

On 02/19/2015 04​:59 AM\, demerphq wrote​:

On 19 February 2015 at 12​:59\, Karl Williamson via RT \perlbug\-followup@&#8203;perl\.org wrote​:

On Wed Feb 18 18​:57​:43 2015\, jhi wrote​:

... it should be possible to try reverting the commit(s) where grok_atou was introduced\, and we would be back to using native atoi().

But atoi has more bugs. Why can't we just take this out of the public API? We can rename it to begin with an underscore and mark it as experimental\, removing the docs\, so that tghere is only discouragement for people using it\, and the docs say you do so at your own risk

Lets not *remove* the docs. Lets update them appropriately. :-)

Yves

We should strive for a really good API before releasing it to the public. At this late date\, it seems advisable to restrict the use of this one to the core given there are complaints about it.

p5pRT commented 9 years ago

From @demerphq

On 20 February 2015 at 00​:43\, Karl Williamson \public@&#8203;khwilliamson\.com wrote​:

On 02/19/2015 04​:59 AM\, demerphq wrote​:

On 19 February 2015 at 12​:59\, Karl Williamson via RT \perlbug\-followup@&#8203;perl\.org wrote​:

On Wed Feb 18 18​:57​:43 2015\, jhi wrote​:

... it should be possible to try reverting the commit(s) where grok_atou was introduced\, and we would be back to using native atoi().

But atoi has more bugs. Why can't we just take this out of the public API? We can rename it to begin with an underscore and mark it as experimental\, removing the docs\, so that tghere is only discouragement for people using it\, and the docs say you do so at your own risk

Lets not *remove* the docs. Lets update them appropriately. :-)

Yves

We should strive for a really good API before releasing it to the public. At this late date\, it seems advisable to restrict the use of this one to the core given there are complaints about it.

Ah I see\, you mean docs available from perldoc\, and I mean docs available from reading the comment above the function.

Yves

-- perl -Mre=debug -e "/just|another|perl|hacker/"

p5pRT commented 9 years ago

From @jhi

On Thursday-201502-19 11​:30\, Hugo van der Sanden via RT wrote​:

I've repushed branch hv/grok with a second commit as a proof of concept. There's a few loose ends\, but should be enough to see how much you like or hate it - passes a basic test run except for XS-APItest\, for which I'll need to acquire some XS fu.

Looks good\, thanks!

In the APItest.xs I am not certain what was the problem\, but I think you just need to rename the numeric.xs​:grok_atou() as grok_atoUV()\, and change it to call grok_atoUV() properly?

There's also Devel​::PPPort code which deals with grok_atou(). We don't need backward compat code for it since it has been in only 5.21.*

p5pRT commented 9 years ago

From @hvds

On Fri Feb 20 05​:39​:45 2015\, jhi wrote​:

On Thursday-201502-19 11​:30\, Hugo van der Sanden via RT wrote​:

I've repushed branch hv/grok with a second commit as a proof of concept.

I should mention also that I'd planned to add type-specific variants for eg U32\, I32 etc afterwards\, but I'm less convinced now that they would add significant value. I'm hoping that if all the examples in core look like 'if (grok_atoUV(...) && uv \<= SOME_LIMIT)' people will copy and adapt that appropriately.

There's a few loose ends\, but should be enough to see how much you like or hate it - passes a basic test run except for XS- APItest\, for which I'll need to acquire some XS fu.

Looks good\, thanks!

In the APItest.xs I am not certain what was the problem\, but I think you just need to rename the numeric.xs​:grok_atou() as grok_atoUV()\, and change it to call grok_atoUV() properly?

Merely lack of experience with XS; once I found an example elsewhere of returning a boolean it was straightforward.

There's also Devel​::PPPort code which deals with grok_atou(). We don't need backward compat code for it since it has been in only 5.21.*

As far as I can see\, the only reference is the captured embed.fnc entry\, it's on the unsupported list; I assume after this lands\, the next update to PPPort will magically remove grok_atou and add grok_atoUV - Matthew\, is that correct?

There are also some references in perlclib.pod\, which will need some new text since we no longer have a drop-in substitute\, and the XXX comments for (at least) magic_set() and reg() need some thinking.

I think what's there now is worth some smoke-testing\, I've renamed the branch to smoke-me/hv-grok.

Hugo

p5pRT commented 9 years ago

From @jhi

On Friday-201502-20 10​:52\, Hugo van der Sanden via RT wrote​:

I think what's there now is worth some smoke-testing\, I've renamed the branch to smoke-me/hv-grok.

I'll grab a copy and smoke it too.

p5pRT commented 9 years ago

From @jhi

On Friday-201502-20 11​:45\, Jarkko Hietaniemi wrote​:

I'll grab a copy and smoke it too.

Good with Configure -des -Dusedevel in​:

CentOS 6.6 x86_64 OS X 10.10.2 x86_64 FreeBSD 9.3-BETA2 amd64 IRIX64 irix 6.5 MIPS Solaris 5.10 sparc

The last two are 32-bit\, that might be of some proof value.

p5pRT commented 9 years ago

From @hvds

On Fri Feb 20 07​:52​:06 2015\, hv wrote​:

I think what's there now is worth some smoke-testing\, I've renamed the branch to smoke-me/hv-grok.

Hmm\, I'm getting a consistent failure for +threads -debugging under win32 that I don't understand. The log for that config (at http​://m-l.org/~perl/smoke/perl/win32/smoke-me/Hugo%20van%20der%20Sanden/log564cba32cbcca439aa4cff6c9da609586124b2f3.log.gz) starts like so​:

Configuration​: -Dusedevel -Duseithreads


make distclean ... Copy Policy.sh ... Configure ...[./Configure -Dusedevel -Duseithreads -DCCTYPE=MSVC]

make ...[nmake -f smoke.mk -nologo /D CCTYPE=MSVC80FREE MAKE="nmake -nologo /D" ] Could Not Find D​:\smoke\perl\smoke-me\build\win32\config.h NMAKE : fatal error U1077​: '"C​:\Program Files\Microsoft Visual Studio 8\VC\BIN\cl.EXE"' : return code '0x2' Stop.

The other configurations all look like​:

Configuration​: -Dusedevel -DDEBUGGING


make distclean ... Copy Policy.sh ... Configure ...[./Configure -Dusedevel -DDEBUGGING -DCCTYPE=MSVC]

make ...[nmake -f smoke.mk -nologo /D CCTYPE=MSVC80FREE MAKE="nmake -nologo /D" ] Could Not Find D​:\smoke\perl\smoke-me\build\win32\config.h Running config_h.PL Writing config.h config.h has changed Options​: (DEBUGGING HAS_TIMES HAVE_INTERP_INTERN [...]

Anyone have a clue for that?

Hugo

p5pRT commented 9 years ago

From @jhi

On Saturday-201502-21 6​:14\, Hugo van der Sanden via RT wrote​:

Configuration​: -Dusedevel -Duseithreads ------------------------------------------------------------------------------ make distclean ... Copy Policy.sh ... Configure ...[./Configure -Dusedevel -Duseithreads -DCCTYPE=MSVC]

Sorry\, I have no idea for the failure as such.

Though the question is\, was this failure there before your patch?

p5pRT commented 9 years ago

From @hvds

On Sat Feb 21 04​:28​:52 2015\, jhi wrote​:

Though the question is\, was this failure there before your patch?

Not that I can see - blead smokes are not showing a config fail\, though the waters are muddied by intermittent test fails.

Hugo

p5pRT commented 9 years ago

From @jhi

On Saturday-201502-21 7​:35\, Hugo van der Sanden via RT wrote​:

On Sat Feb 21 04​:28​:52 2015\, jhi wrote​:

Though the question is\, was this failure there before your patch?

Not that I can see - blead smokes are not showing a config fail\, though the waters are muddied by intermittent test fails.

I will kick off a smoke on another branch\, to gain more data. Hoping I hit the same set of smoke boxes.

Hugo

p5pRT commented 9 years ago

From @craigberry

On Sat\, Feb 21\, 2015 at 5​:14 AM\, Hugo van der Sanden via RT \perlbug\-followup@&#8203;perl\.org wrote​:

On Fri Feb 20 07​:52​:06 2015\, hv wrote​:

I think what's there now is worth some smoke-testing\, I've renamed the branch to smoke-me/hv-grok.

Hmm\, I'm getting a consistent failure for +threads -debugging under win32 that I don't understand. The log for that config (at http​://m-l.org/~perl/smoke/perl/win32/smoke-me/Hugo%20van%20der%20Sanden/log564cba32cbcca439aa4cff6c9da609586124b2f3.log.gz) starts like so​:

Configuration​: -Dusedevel -Duseithreads ------------------------------------------------------------------------------ make distclean ... Copy Policy.sh ... Configure ...[./Configure -Dusedevel -Duseithreads -DCCTYPE=MSVC]

make ...[nmake -f smoke.mk -nologo /D CCTYPE=MSVC80FREE MAKE="nmake -nologo /D" ] Could Not Find D​:\smoke\perl\smoke-me\build\win32\config.h NMAKE : fatal error U1077​: '"C​:\Program Files\Microsoft Visual Studio 8\VC\BIN\cl.EXE"' : return code '0x2' Stop.

I can't reproduce that just doing a straight build of the branch on Windows 7 with CCTYPE=MSVC110 so it must have something to do with what gets written to smoke.mk by Test​::Smoke. I think normally config.h is just copied from a template to build miniperl and then another config.h is generated later to build perl\, but it looks like Test​::Smoke generates config.h from the outset. Seeing the rules in smoke.mk for generating config.h might shed light on something.

p5pRT commented 9 years ago

From @hvds

On Sat Feb 21 08​:31​:49 2015\, craig.a.berry@​gmail.com wrote​:

I can't reproduce that just doing a straight build of the branch on Windows 7 with CCTYPE=MSVC110 so it must have something to do with what gets written to smoke.mk by Test​::Smoke.

Thanks Craig\, I see also that Friday's rebase got a clean Win32 run\, which makes me feel a lot better.

Hugo

p5pRT commented 9 years ago

From @hvds

On Thu Feb 19 08​:44​:25 2015\, public@​khwilliamson.com wrote​:

We should strive for a really good API before releasing it to the public. At this late date\, it seems advisable to restrict the use of this one to the core given there are complaints about it.

I tried restricting the new function​:

-Apdn |bool |grok_atoUV |NN const char* pv|NN UV* valptr|NULLOK const char** endptr +EXpdn |bool |grok_atoUV |NN const char* pv|NN UV* valptr|NULLOK const char** endptr

.. but ext/Dynaloader complains it can't see it. Should I simply be adding a #define PERL_EXT to that module? I had assumed all the core extensions would get that automatically\, so I'm not sure what the ramifications are.

Hugo

p5pRT commented 9 years ago

From @khwilliamson

On 02/27/2015 11​:08 AM\, Hugo van der Sanden via RT wrote​:

On Thu Feb 19 08​:44​:25 2015\, public@​khwilliamson.com wrote​:

We should strive for a really good API before releasing it to the public. At this late date\, it seems advisable to restrict the use of this one to the core given there are complaints about it.

I tried restricting the new function​:

-Apdn |bool |grok_atoUV |NN const char* pv|NN UV* valptr|NULLOK const char** endptr +EXpdn |bool |grok_atoUV |NN const char* pv|NN UV* valptr|NULLOK const char** endptr

.. but ext/Dynaloader complains it can't see it. Should I simply be adding a #define PERL_EXT to that module? I had assumed all the core extensions would get that automatically\, so I'm not sure what the ramifications are.

Hugo

--- via perlbug​: queue​: perl5 status​: open https://rt-archive.perl.org/perl5/Ticket/Display.html?id=123814

Someone else will have to tell us about the ramifications of adding PERL_EXT are.

But\, I think the embed.fnc entry should have an M flag\, and the   =for apidoc line removed from the documentation for the function. That will mean that no documentation is generated\, and so someone would have to be looking through the code to even know that this function exists.

One option that wouldn't have further ramifications would be to add a line to dlutils.c   #define PERL_IN_DLUTILS_C and then in embed.fnc\, surround its entry with

#if defined(PERL_CORE) || defined(PERL_EXT) || defined(PERL_IN_DLUTILS_C)

and an #endif.

That way the prototype for the function is available only to the specified areas and no others\, unless someone is really crazy and adds there own #defines to get around our restriction.

p5pRT commented 9 years ago

From @hvds

I've now merged this work​:

35cd12d [perl #123814] stricter handling of numbers in regexp quantifiers 22ff313 [perl #123814] replace grok_atou with grok_atoUV 73e4395 grok_atoUV​: don't make part of API

There are a couple of loose ends that I hope to get to\, such as [perl #123991]\, but I believe that this is already an improvement.

Hugo

p5pRT commented 9 years ago

@hvds - Status changed from 'open' to 'pending release'

p5pRT commented 9 years ago

From @jhi

On Monday-201503-09 18​:32\, Hugo van der Sanden via RT wrote​:

I've now merged this work​:

35cd12d [perl #123814] stricter handling of numbers in regexp quantifiers 22ff313 [perl #123814] replace grok_atou with grok_atoUV 73e4395 grok_atoUV​: don't make part of API

There are a couple of loose ends that I hope to get to\, such as [perl #123991]\, but I believe that this is already an improvement.

Thanks!

One comment​:

+ new_egid = 0; /* XXX is this safe? */ ... + gary[i] = 0; /* XXX is this safe? */

Zero sounds like a bit unsafe for group ids. I would suggest (Gid_t)(-1). Though how portable that is again across OSes is anyone's guess.

Hugo

p5pRT commented 9 years ago

From @hvds

Jarkko Hietaniemi jhi@iki.fi wrote:

One comment:

+                new_egid = 0;   /* XXX is this safe? */
...
+                    gary[i] = 0;    /* XXX is this safe? */

Zero sounds like a bit unsafe for group ids. I would suggest (Gid_t)(-1). Though how portable that is again across OSes is anyone's guess.

Like the below? I assume the define should move to a header somewhere, but I'm not sure which is appropriate.

Is there a todo somewhere to fix the various 'XXX silently ignores failure' comments from b469f1e0?

Hugo

--- a/mg.c
+++ b/mg.c
@@ -3015,6 +3015,9 @@ Perl_magic_set(pTHX_ SV *sv, MAGIC *mg)
        {
         /* XXX $) currently silently ignores failures */
        Gid_t new_egid;
+#ifndef INVALID_GID
+#define INVALID_GID ((Gid_t)-1)
+#endif
 #ifdef HAS_SETGROUPS
        {
            const char *p = SvPV_const(sv, len);
@@ -3035,7 +3038,7 @@ Perl_magic_set(pTHX_ SV *sv, MAGIC *mg)
             if (grok_atoUV(p, &uv, &endptr))
                 new_egid = (Gid_t)uv;
             else {
-                new_egid = 0;   /* XXX is this safe? */
+                new_egid = INVALID_GID;
                 endptr = NULL;
             }
             for (i = 0; i < maxgrp; ++i) {
@@ -3053,7 +3056,7 @@ Perl_magic_set(pTHX_ SV *sv, MAGIC *mg)
                 if (grok_atoUV(p, &uv, &endptr))
                     gary[i] = (Groups_t)uv;
                 else {
-                    gary[i] = 0;    /* XXX is this safe? */
+                    gary[i] = INVALID_GID;
                     endptr = NULL;
                 }
             }
p5pRT commented 9 years ago

From @jhi

On Tuesday-201503-10 4​:15\, Hugo van der Sanden via RT wrote​:

Like the below? I assume the define should move to a header somewhere\, but

Yeah\, that looks right.

On the portability​: -1 is often "bad group id"\, and "-2" is "nobody". Notice I said hand-wavingly "often".

I'm not sure which is appropriate.

perl.h\, maybe\, that seems to be a common dumping ground (it really needs some splitting up)\, but now in almost-5.22\, maybe just keep the def in mg.c.

Is there a todo somewhere to fix the various 'XXX silently ignores failure' comments from b469f1e0?

Beyond those XXX markers\, no. The conundrum there is how should a variable assignment (Perl's chosen API here) fail?

p5pRT commented 9 years ago

From @hvds

I've now pushed 0635704a98​:

  mg.c​:Perl_magic_set​: don't use 0 as "failed" gid_t  
  For [perl #123814] we added checking for grok_* parse failures in   magic_set for $)\, but used 0 as the placeholder result in those   cases (since we don't have an effective way to report an error for   this). (Gid_t)(-1) is a safer placeholder\, since on many systems   that'll map to an explicit bad group id.

Hugo

p5pRT commented 9 years ago

From @jhi

On Tuesday-201503-10 19:35, Hugo van der Sanden via RT wrote:

I've now pushed 0635704a98:

 mg.c:Perl_magic_set: don't use 0 as "failed" gid_t

 For [perl #123814] we added checking for grok_* parse failures in
 magic_set for $), but used 0 as the placeholder result in those
 cases (since we don't have an effective way to report an error for
 this). (Gid_t)(-1) is a safer placeholder, since on many systems
 that'll map to an explicit bad group id.

Thanks.

I'll give the blead some smoking love now.

Hugo

p5pRT commented 9 years ago

From @wolfsage

On Fri\, Feb 20\, 2015 at 10​:52 AM\, Hugo van der Sanden via RT \perlbug\-followup@&#8203;perl\.org wrote​:

As far as I can see\, the only reference is the captured embed.fnc entry\, it's on the unsupported list; I assume after this lands\, the next update to PPPort will magically remove grok_atou and add grok_atoUV - Matthew\, is that correct?

If it's been removed from embed.fnc and replaced\, the next time we regen the todo files\, etc\, then yeah\, that should go away and grok_atoUV should show up.

Is this something we should do before 5.22 release so it can be included?

-- Matthew Horsfall (alh)

p5pRT commented 9 years ago

From @hvds

"Matthew Horsfall (alh)" wolfsage@gmail.com wrote:

On Fri, Feb 20, 2015 at 10:52 AM, Hugo van der Sanden via RT perlbug-followup@perl.org wrote:

As far as I can see, the only reference is the captured embed.fnc entry, it's on the unsupported list; I assume after this lands, the next update to PPPort will magically remove grok_atou and add grok_atoUV - Matthew, is that correct?

If it's been removed from embed.fnc and replaced, the next time we regen the todo files, etc, then yeah, that should go away and grok_atoUV should show up.

Is this something we should do before 5.22 release so it can be included?

I'm not sure if by "this" you mean the API change or regen of PPPort.

The code changes have already been pushed; net change over a couple of commits (22ff313068, 73e4395489) is:

-Apdn   |UV     |grok_atou      |NN const char* pv|NULLOK const char** endptr
+EXpn   |bool   |grok_atoUV     |NN const char* pv|NN UV* valptr|NULLOK const char** endptr

.. since we also decided we didn't want to commit immediately to freezing the new interface for the public API.

I certainly think we'd want to regen PPPort before 5.22 release.

Hugo

p5pRT commented 9 years ago

From @wolfsage

On Wed\, Mar 11\, 2015 at 4​:08 AM\, \hv@&#8203;crypt\.org wrote​:

I certainly think we'd want to regen PPPort before 5.22 release.

I've just uploaded Devel​::PPPort 3.31 which includes the grok_atou removal; if someone doesn't get to it before me in the next few days I'll pull the changes into blead.

Thanks\,

-- Matthew Horsfall (alh)

p5pRT commented 9 years ago

From @wolfsage

On Thu\, Mar 12\, 2015 at 10​:27 AM\, Matthew Horsfall (alh) \wolfsage@&#8203;gmail\.com wrote​:

I've just uploaded Devel​::PPPort 3.31 which includes the grok_atou removal; if someone doesn't get to it before me in the next few days I'll pull the changes into blead.

http​://perl5.git.perl.org/perl.git/commit/75ddf117c3d998647ee46be0306a5e763ea236aa contains the updated Devel​::PPPort\, currently in a branch. If someone could +1 that I'll pull it into blead.

Thanks\,

-- Matthew Horsfall (alh)

p5pRT commented 9 years ago

From @hvds

On Tue Mar 17 06​:47​:48 2015\, alh wrote​:

http​://perl5.git.perl.org/perl.git/commit/75ddf117c3d998647ee46be0306a5e763ea236aa contains the updated Devel​::PPPort\, currently in a branch. If someone could +1 that I'll pull it into blead.

I confirm that it passes tests for me\, and as expected no longer mentions grok_atou. I'm not sure if there's other things I should be looking out for.

Hugo

p5pRT commented 9 years ago

From @wolfsage

On Wed\, Mar 18\, 2015 at 3​:27 PM\, Hugo van der Sanden via RT \perlbug\-followup@&#8203;perl\.org wrote​:

I confirm that it passes tests for me\, and as expected no longer mentions grok_atou. I'm not sure if there's other things I should be looking out for.

Thanks\, applied as 77f7a5f46952c6d0ba0a3c95af769b3b764bf65a.

-- Matthew Horsfall (alh)

p5pRT commented 9 years ago

From @khwilliamson

Thank you for submitting this ticket.

The issue should now be resolved with the release today of Perl v5.22\, which is available at http​://www.perl.org/get.html -- Karl Williamson for the Perl 5 team

p5pRT commented 9 years ago

@khwilliamson - Status changed from 'pending release' to 'resolved'