Perl / perl5

🐪 The Perl programming language
https://dev.perl.org/perl5/
Other
1.95k stars 554 forks source link

$1 handled differently in integer arithmetics #11921

Closed p5pRT closed 12 years ago

p5pRT commented 12 years ago

Migrated from rt.perl.org#109542 (status was 'resolved')

Searchable as RT109542$

p5pRT commented 12 years ago

From perlbug.dttn@manchmal.in-ulm.de

This is a bug report for perl from perlbug.dttn@​manchmal.in-ulm.de\, generated with the help of perlbug 1.39 running under perl 5.14.2.


Hi .\,

Still not sure whether this is a bug\, so I'd call it surprising.

Using a regular expression to capture a sequence of digits puts something into $1 that behaves differently when used for further integer arithmetics. Basically\, the reproducer is as follows​:

perl -e '$s = "976562500000"; $s =~ /(\d+)/; printf "%s = %s\n"\, ($1 * 1024)\, int ($1 * 1024)'

Expected (at least on amd64)​:

1000000000000000 = 1000000000000000

Got​:

1e+15 = 1000000000000000

The surprising thing is this does _not_ happen with named capture buffers​:

perl -E '$s = "976562500000"; $s =~ /(?\\d+)/; printf "%s = %s\n"\, ($+{match} * 1024)\, int ($+{match} * 1024)'

Nor after copying $1 into another variable​:

perl -e '$s = "976562500000"; $s =~ /(\d+)/; $t = $1; printf "%s = %s\n"\, ($t * 1024)\, int ($t * 1024)'

It appears using $1 switches computation to floating point even the numbers used are still in the integer range. Care to take a look into that?

See also http​://munin-monitoring.org/ticket/1195 for a case where this issue causes unwanted effects in the wild.

Regards\,

  Christoph Biedl \perlbug\.dttn@​manchmal\.in\-ulm\.de



Flags​:   category=core   severity=low


Site configuration information for perl 5.14.2​:

Configured by Debian Project at Mon Nov 28 23​:30​:36 UTC 2011.

Summary of my perl5 (revision 5 version 14 subversion 2) configuration​:  
  Platform​:   osname=linux\, osvers=2.6.32-5-amd64\, archname=x86_64-linux-gnu-thread-multi   uname='linux brahms 2.6.32-5-amd64 #1 smp thu nov 3 03​:41​:26 utc 2011 x86_64 gnulinux '   config_args='-Dusethreads -Duselargefiles -Dccflags=-DDEBIAN -Dcccdlflags=-fPIC -Darchname=x86_64-linux-gnu -Dprefix=/usr -Dprivlib=/usr/share/perl/5.14 -Darchlib=/usr/lib/perl/5.14 -Dvendorprefix=/usr -Dvendorlib=/usr/share/perl5 -Dvendorarch=/usr/lib/perl5 -Dsiteprefix=/usr/local -Dsitelib=/usr/local/share/perl/5.14.2 -Dsitearch=/usr/local/lib/perl/5.14.2 -Dman1dir=/usr/share/man/man1 -Dman3dir=/usr/share/man/man3 -Dsiteman1dir=/usr/local/man/man1 -Dsiteman3dir=/usr/local/man/man3 -Duse64bitint -Dman1ext=1 -Dman3ext=3perl -Dpager=/usr/bin/sensible-pager -Uafs -Ud_csh -Ud_ualarm -Uusesfio -Uusenm -Ui_libutil -DDEBUGGING=-g -Doptimize=-O2 -Duseshrplib -Dlibperl=libperl.so.5.14.2 -des'   hint=recommended\, useposix=true\, d_sigaction=define   useithreads=define\, usemultiplicity=define   useperlio=define\, d_sfio=undef\, uselargefiles=define\, usesocks=undef   use64bitint=define\, use64bitall=define\, uselongdouble=undef   usemymalloc=n\, bincompat5005=undef   Compiler​:   cc='cc'\, ccflags ='-D_REENTRANT -D_GNU_SOURCE -DDEBIAN -fno-strict-aliasing -pipe -fstack-protector -I/usr/local/include -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64'\,   optimize='-O2 -g'\,   cppflags='-D_REENTRANT -D_GNU_SOURCE -DDEBIAN -fno-strict-aliasing -pipe -fstack-protector -I/usr/local/include'   ccversion=''\, gccversion='4.6.2'\, gccosandvers=''   intsize=4\, longsize=8\, ptrsize=8\, doublesize=8\, byteorder=12345678   d_longlong=define\, longlongsize=8\, d_longdbl=define\, longdblsize=16   ivtype='long'\, ivsize=8\, nvtype='double'\, nvsize=8\, Off_t='off_t'\, lseeksize=8   alignbytes=8\, prototype=define   Linker and Libraries​:   ld='cc'\, ldflags =' -fstack-protector -L/usr/local/lib'   libpth=/usr/local/lib /lib/x86_64-linux-gnu /lib/../lib /usr/lib/x86_64-linux-gnu /usr/lib/../lib /lib /usr/lib   libs=-lgdbm -lgdbm_compat -ldb -ldl -lm -lpthread -lc -lcrypt   perllibs=-ldl -lm -lpthread -lc -lcrypt   libc=\, so=so\, useshrplib=true\, libperl=libperl.so.5.14.2   gnulibc_version='2.13'   Dynamic Linking​:   dlsrc=dl_dlopen.xs\, dlext=so\, d_dlsymun=undef\, ccdlflags='-Wl\,-E'   cccdlflags='-fPIC'\, lddlflags='-shared -O2 -g -L/usr/local/lib -fstack-protector'

Locally applied patches​:  


@​INC for perl 5.14.2​:   /etc/perl   /usr/local/lib/perl/5.14.2   /usr/local/share/perl/5.14.2   /usr/lib/perl5   /usr/share/perl5   /usr/lib/perl/5.14   /usr/share/perl/5.14   /usr/local/lib/site_perl   .


Environment for perl 5.14.2​:   HOME=/home/cbiedl   LANG=de_DE.UTF-8   LANGUAGE (unset)   LC_MESSAGES=C   LD_LIBRARY_PATH (unset)   LOGDIR (unset)   PATH=/usr/local/bin​:/usr/bin​:/bin​:/usr/local/games​:/usr/games   PERL_BADLANG (unset)   SHELL=/bin/bash

p5pRT commented 12 years ago

From ams@toroid.org

At 2012-01-31 23​:20​:24 -0800\, perlbug-followup@​perl.org wrote​:

perl -e '$s = "976562500000"; $s =~ /(\d+)/; printf "%s = %s\n"\, ($1 * 1024)\, int ($1 * 1024)'

Expected (at least on amd64)​:

1000000000000000 = 1000000000000000

Got​:

1e+15 = 1000000000000000

I understand superficially why this happens\, but I'm not sure what we can do about it\, or if I'm missing something deeper. Look at this code in pp.c/pp_multiply​:

1262 svr = TOPs; 1263 svl = TOPm1s; 1264 #ifdef PERL_PRESERVE_IVUV 1265 SvIV_please_nomg(svr); 1266 if (SvIOK(svr)) { 1267 /* Unless the left argument is integer in range we are going to have to 1268 use NV maths. Hence only attempt to coerce the right argument if 1269 we know the left is integer. */ 1270 /* Left operand is defined\, so is it IV? */ 1271 SvIV_please_nomg(svl); 1272 if (SvIOK(svl)) {

The SvIV_please_nomg(svl) line doesn't do anything to $1 because it's not NOK||POK\, so SvIOK(svl) is not true afterwards\, and svl is treated as an NV. $+{match} is also magical\, but POK is set\, so it's passed to sv_2iv_flags\, which calculates the correct IV and sets IOK. So it uses integer math. Same with copying $1\, except it's only POK\, no magic.

-- ams

p5pRT commented 12 years ago

The RT System itself - Status changed from 'new' to 'open'

p5pRT commented 12 years ago

From @cpansprout

On Wed Feb 01 09​:56​:12 2012\, ams@​toroid.org wrote​:

At 2012-01-31 23​:20​:24 -0800\, perlbug-followup@​perl.org wrote​:

perl -e '$s = "976562500000"; $s =~ /(\d+)/; printf "%s = %s\n"\, ($1 * 1024)\, int ($1 * 1024)'

Expected (at least on amd64)​:

1000000000000000 = 1000000000000000

Got​:

1e+15 = 1000000000000000

I understand superficially why this happens\, but I'm not sure what we can do about it\, or if I'm missing something deeper. Look at this code in pp.c/pp_multiply​:

1262 svr = TOPs; 1263 svl = TOPm1s; 1264 #ifdef PERL_PRESERVE_IVUV 1265 SvIV_please_nomg(svr); 1266 if (SvIOK(svr)) { 1267 /* Unless the left argument is integer in range we are going to have to 1268 use NV maths. Hence only attempt to coerce the right argument if 1269 we know the left is integer. */ 1270 /* Left operand is defined\, so is it IV? */ 1271 SvIV_please_nomg(svl); 1272 if (SvIOK(svl)) {

The SvIV_please_nomg(svl) line doesn't do anything to $1 because it's not NOK||POK\, so SvIOK(svl) is not true afterwards\, and svl is treated as an NV. $+{match} is also magical\, but POK is set\, so it's passed to sv_2iv_flags\, which calculates the correct IV and sets IOK. So it uses integer math. Same with copying $1\, except it's only POK\, no magic.

SvIV_please presumably doesn’t trust the private flags\, because magic could invalidate them. At least that’s what I assume was the original reason. But in the case of SvIV_please_nomg\, we have already called magic (the whole raison d’être for the _nomg variant)\, so there is no reason not to treat SvPOKp as equivalent to SvPOK.

Does that sound right?

--

Father Chrysostomos

p5pRT commented 12 years ago

From ams@toroid.org

At 2012-02-01 12​:35​:00 -0800\, perlbug-followup@​perl.org wrote​:

But in the case of SvIV_please_nomg\, we have already called magic (the whole raison d’être for the _nomg variant)\, so there is no reason not to treat SvPOKp as equivalent to SvPOK.

Does that sound right?

It sounds right in principle\, but it's not clear to me how to translate that into a fix. In how many more places is that equivalence relevant\, and where must we add "|| SvPOKp(sv)"s to let them all through? Doing it only in SvIV_please_nomg is insufficient. Or is there some easier way to do it? Comments gratefully received.

-- ams

p5pRT commented 12 years ago

From @cpansprout

On Wed Feb 01 13​:01​:39 2012\, ams@​toroid.org wrote​:

At 2012-02-01 12​:35​:00 -0800\, perlbug-followup@​perl.org wrote​:

But in the case of SvIV_please_nomg\, we have already called magic (the whole raison d’être for the _nomg variant)\, so there is no reason not to treat SvPOKp as equivalent to SvPOK.

Does that sound right?

It sounds right in principle\, but it's not clear to me how to translate that into a fix. In how many more places is that equivalence relevant\, and where must we add "|| SvPOKp(sv)"s to let them all through? Doing it only in SvIV_please_nomg is insufficient. Or is there some easier way to do it? Comments gratefully received.

But how much other code is affected? It is just the numeric ops? It looks as though this will require what I call a bug hunt​: grepping for instances and examining them for correctness.

--

Father Chrysostomos

p5pRT commented 12 years ago

From @demerphq

On 1 February 2012 22​:39\, Father Chrysostomos via RT \perlbug\-followup@​perl\.org wrote​:

On Wed Feb 01 13​:01​:39 2012\, ams@​toroid.org wrote​:

At 2012-02-01 12​:35​:00 -0800\, perlbug-followup@​perl.org wrote​:

But in the case of SvIV_please_nomg\, we have already called magic (the whole raison d’être for the _nomg variant)\, so there is no reason not to treat SvPOKp as equivalent to SvPOK.

Does that sound right?

It sounds right in principle\, but it's not clear to me how to translate that into a fix. In how many more places is that equivalence relevant\, and where must we add "|| SvPOKp(sv)"s to let them all through? Doing it only in SvIV_please_nomg is insufficient. Or is there some easier way to do it? Comments gratefully received.

But how much other code is affected?  It is just the numeric ops?  It looks as though this will require what I call a bug hunt​: grepping for instances and examining them for correctness.

Maybe we should back up a bit and consider why its ok that $+{match} is fine in this context\, and $1 isnt\, and then figure out if we can make $1 work like $+{match}.

Also\, i am wondering\, we have historically had a lot of trouble due to the tied nature of $1 and $2 and the rest of the regex magic vars. I am wondering why they have to be magic? Why don't we just copy the values and be done with the bugs? If they were true globals\, and we populated them with localized LVALUES would we not get the same effect\, and indeed rid ourselves of the bugs and make them updatable too?

I mean\, we have all the information and infrastructure to do this\, is there any reason not to?

Yves

-- perl -Mre=debug -e "/just|another|perl|hacker/"

p5pRT commented 12 years ago

From zefram@fysh.org

demerphq wrote​:

                                   Why don't we just copy the

values and be done with the bugs?

Performance. I'd be happy to lose a bit of performance for saner behaviour\, but Reini will argue vehemently against it. Between the two of us you'll find a range of opinions about which optimisations are worth the hassle.

-zefram

p5pRT commented 12 years ago

From @rurban

On Wed\, Feb 1\, 2012 at 5​:48 PM\, Zefram \zefram@​fysh\.org wrote​:

demerphq wrote​: Why don't we just copy the

values and be done with the bugs?

Performance.  I'd be happy to lose a bit of performance for saner behaviour\, but Reini will argue vehemently against it.  Between the two of us you'll find a range of opinions about which optimisations are worth the hassle.

I will not oppose any fix here in this area\, as it does not affect general performance. Handling int's from regex results is not performance critical :) If copying helps please do copy.

I only oppose slowness in critical areas. -- Reini Urban http​://cpanel.net/   http​://www.perl-compiler.org/

p5pRT commented 12 years ago

From @Leont

On Thu\, Feb 2\, 2012 at 12​:48 AM\, Zefram \zefram@​fysh\.org wrote​:

Performance.  I'd be happy to lose a bit of performance for saner behaviour\, but Reini will argue vehemently against it.  Between the two of us you'll find a range of opinions about which optimisations are worth the hassle.

I have a feeling that if we do this we might as well add $& to that list. Hard data would be most welcome.

Leon

p5pRT commented 12 years ago

From @demerphq

On 2 February 2012 01​:13\, Leon Timmermans \fawaka@​gmail\.com wrote​:

On Thu\, Feb 2\, 2012 at 12​:48 AM\, Zefram \zefram@​fysh\.org wrote​:

Performance.  I'd be happy to lose a bit of performance for saner behaviour\, but Reini will argue vehemently against it.  Between the two of us you'll find a range of opinions about which optimisations are worth the hassle.

I have a feeling that if we do this we might as well add $& to that list.

Hard data would be most welcome.

Yes\, but probably requires a complete patch. Some thinking on what the issues are first is probably a good idea.

Yves

-- perl -Mre=debug -e "/just|another|perl|hacker/"

p5pRT commented 12 years ago

From @demerphq

On 2 February 2012 00​:48\, Zefram \zefram@​fysh\.org wrote​:

demerphq wrote​:

                                       Why don't we just copy the values and be done with the bugs?

Performance.  I'd be happy to lose a bit of performance for saner behaviour\, but Reini will argue vehemently against it.  Between the two of us you'll find a range of opinions about which optimisations are worth the hassle.

I don't think this would actually be slower. I suspect it would be faster\, and make scripts that use the evil three $`\, $&\, $' much faster.

Right now the rules for copying the string are that we copy it in entirety if we see $1 or $2 in the pattern or have seen the evil three anywhere in the script.

Anyway\, I realized after i posted that LVALUES dont quite match\, as they dont preserve the LVALUE string when the target variable is undef()'ed\, or modified. If we could figure out a clean way to make that happen\, then the overhead would only be the creation of the LVALUE SV\, but we would eliminate a lot of string copying. AFAIK\, there have been a number of attempts to add copy-on-write behaviour to eliminate this copying and it has never quite worked out\, and im thinking that maybe LVALUE provides a neat way to make it work.

cheers\, Yves

-- perl -Mre=debug -e "/just|another|perl|hacker/"

p5pRT commented 12 years ago

From @cpansprout

On Wed Feb 01 16​:48​:59 2012\, demerphq wrote​:

On 2 February 2012 00​:48\, Zefram \zefram@​fysh\.org wrote​:

demerphq wrote​:

� � � � � � � � � � � � � � � � � � � �Why don't we just copy the values and be done with the bugs?

Performance. �I'd be happy to lose a bit of performance for saner behaviour\, but Reini will argue vehemently against it. �Between the two of us you'll find a range of opinions about which optimisations are worth the hassle.

I don't think this would actually be slower. I suspect it would be faster\, and make scripts that use the evil three $`\, $&\, $' much faster.

By dividing PL_sawampersand into three variables to avoid copying the whole string? I think that would make things more buggy.

Right now the rules for copying the string are that we copy it in entirety if we see $1 or $2 in the pattern or have seen the evil three anywhere in the script.

Anyway\, I realized after i posted that LVALUES dont quite match\, as they dont preserve the LVALUE string when the target variable is undef()'ed\, or modified. If we could figure out a clean way to make that happen\, then the overhead would only be the creation of the LVALUE SV\, but we would eliminate a lot of string copying. AFAIK\, there have been a number of attempts to add copy-on-write behaviour to eliminate this copying and it has never quite worked out\, and im thinking that maybe LVALUE provides a neat way to make it work.

But LVALUEs are magical. That’s how they work.

And we would still have this same bug with tied variables\, I believe. So changing $1 won’t solve this problem.

--

Father Chrysostomos

p5pRT commented 12 years ago

From @hvds

demerphq \demerphq@​gmail\.com wrote​: :Maybe we should back up a bit and consider why its ok that $+{match} :is fine in this context\, and $1 isnt\, and then figure out if we can :make $1 work like $+{match}. : :Also\, i am wondering\, we have historically had a lot of trouble due to :the tied nature of $1 and $2 and the rest of the regex magic vars. I :am wondering why they have to be magic? Why don't we just copy the :values and be done with the bugs? If they were true globals\, and we :populated them with localized LVALUES would we not get the same :effect\, and indeed rid ourselves of the bugs and make them updatable :too? : :I mean\, we have all the information and infrastructure to do this\, is :there any reason not to?

When would you do it?

I'm pretty sure you must not do it during matching\, while still backtracking\, so I guess backreferences within the pattern would still need to work with the offsets as they do now.

I can't remember right now whether they are made visible to embedded code; if they are\, that becomes rather tricky since you are still matching at that point. If not\, it might be practical to copy at the point the match is complete\, and refer direct to the offsets only internally for /([ab])\1/ type matches.

I think the big cost will be the multiple copies - I'm sure it is common to have $_\, $& and $1 all refer to the same large string\, and probably not uncommon to have $2 to refer to (say) all but a few characters of the same.

But in any case I inferred from the rest of the discussion that pp_multiply is simply doing the wrong thing with magic; if that's the case $1 is less villain than coughing canary\, and the problem is not best solved by stuffing the canary.

Hugo

p5pRT commented 12 years ago

From @cpansprout

On Wed Feb 01 13​:39​:05 2012\, sprout wrote​:

On Wed Feb 01 13​:01​:39 2012\, ams@​toroid.org wrote​:

At 2012-02-01 12​:35​:00 -0800\, perlbug-followup@​perl.org wrote​:

But in the case of SvIV_please_nomg\, we have already called magic (the whole raison d’être for the _nomg variant)\, so there is no reason not to treat SvPOKp as equivalent to SvPOK.

Does that sound right?

It sounds right in principle\, but it's not clear to me how to translate that into a fix. In how many more places is that equivalence relevant\, and where must we add "|| SvPOKp(sv)"s to let them all through? Doing it only in SvIV_please_nomg is insufficient. Or is there some easier way to do it? Comments gratefully received.

But how much other code is affected? It is just the numeric ops? It looks as though this will require what I call a bug hunt​: grepping for instances and examining them for correctness.

I did ‘ack -n '^(?=.*\bSvPOK\b)(?!.*\bSvPOKp\b)'’ and then examined each result.

You can see the results of my bug hunt in the branch that was merged into blead with commit f2ab049418.

This ticket bug\, involving numeric ops and $1\, was more complicated than simply using SvPOKp in SvIV_please_nomg. All the code that used the latter macro followed it with if(SvIOK(sv))\, so I had to change SvIV_please_nomg to return a boolean. Also\, the gmagic handling wasn’t so simple after all.

It was commit 01f91bf275 that fixed this particular bug.

--

Father Chrysostomos

p5pRT commented 12 years ago

@cpansprout - Status changed from 'open' to 'resolved'