Perl / perl5

🐪 The Perl programming language
https://dev.perl.org/perl5/
Other
1.85k stars 527 forks source link

bug with glob in scalar context #8653

Closed p5pRT closed 17 years ago

p5pRT commented 17 years ago

Migrated from rt.perl.org#40622 (status was 'rejected')

Searchable as RT40622$

p5pRT commented 17 years ago

From @dakkar

Created by @dakkar

When using \<> as glob in scalar context\, if the pattern contains variables to be interpolated\, the iterator is not reset when the value of the variables change. See attached test case for an example of this behaviour.

The behaviour is quite similar to m//og in scalar context\, but in that case is documented (and removing the /o modifier allows the programmer to change that behaviour).

If changing the behaviour of the "\<> as glob" operator is hard\, or confusing\, or whatever\, at least the documentation should be changed to explicitly define the "problem".

Perl Info ``` Flags: category=core severity=low Site configuration information for perl v5.8.8: Configured by Gentoo at Tue May 30 19:46:18 CEST 2006. Summary of my perl5 (revision 5 version 8 subversion 8) configuration: Platform: osname=linux, osvers=2.6.15-gentoo-r1, archname=i686-linux-thread-multi uname='linux dechirico 2.6.15-gentoo-r1 #1 smp preempt mon apr 3 14:22:18 cest 2006 i686 intel(r) pentium(r) 4 cpu 3.40ghz gnulinux ' config_args='-des -Darchname=i686-linux-thread -Dcccdlflags=-fPIC -Dccdlflags=-rdynamic -Dcc=i686-pc-linux-gnu-gcc -Dprefix=/usr -Dvendorprefix=/usr -Dsiteprefix=/usr -Dlocincpth= -Doptimize=-O3 -march=pentium4 -pipe -Duselargefiles -Dd_semctl_semun -Dscriptdir=/usr/bin -Dman1dir=/usr/share/man/man1 -Dman3dir=/usr/share/man/man3 -Dinstallman1dir=/usr/share/man/man1 -Dinstallman3dir=/usr/share/man/man3 -Dman1ext=1 -Dman3ext=3pm -Dinc_version_list=5.8.0 5.8.0/i686-linux-thread-multi 5.8.2 5.8.2/i686-linux-thread-multi 5.8.4 5.8.4/i686-linux-thread-multi 5.8.5 5.8.5/i686-linux-thread-multi 5.8.6 5.8.6/i686-linux-thread-multi 5.8.7 5.8.7/i686-linux-thread-multi -Dcf_by=Gentoo -Ud_csh -Dusenm -Dusethreads -Di_ndbm -Di_gdbm -Di_db' hint=recommended, useposix=true, d_sigaction=define usethreads=define use5005threads=undef useithreads=define usemultiplicity=define useperlio=define d_sfio=undef uselargefiles=define usesocks=undef use64bitint=undef use64bitall=undef uselongdouble=undef usemymalloc=n, bincompat5005=undef Compiler: cc='i686-pc-linux-gnu-gcc', ccflags ='-D_REENTRANT -D_GNU_SOURCE -DTHREADS_HAVE_PIDS -fno-strict-aliasing -pipe -Wdeclaration-after-statement -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64', optimize='-O3 -march=pentium4 -pipe', cppflags='-D_REENTRANT -D_GNU_SOURCE -DTHREADS_HAVE_PIDS -fno-strict-aliasing -pipe -Wdeclaration-after-statement' ccversion='', gccversion='3.4.6 (Gentoo 3.4.6-r1, ssp-3.4.5-1.0, pie-8.7.9)', gccosandvers='' intsize=4, longsize=4, ptrsize=4, doublesize=8, byteorder=1234 d_longlong=define, longlongsize=8, d_longdbl=define, longdblsize=12 ivtype='long', ivsize=4, nvtype='double', nvsize=8, Off_t='off_t', lseeksize=8 alignbytes=4, prototype=define Linker and Libraries: ld='i686-pc-linux-gnu-gcc', ldflags =' -L/usr/local/lib' libpth=/usr/local/lib /lib /usr/lib libs=-lpthread -lnsl -lndbm -lgdbm -ldb -ldl -lm -lcrypt -lutil -lc perllibs=-lpthread -lnsl -ldl -lm -lcrypt -lutil -lc libc=/lib/libc-2.3.6.so, so=so, useshrplib=false, libperl=libperl.a gnulibc_version='2.3.6' Dynamic Linking: dlsrc=dl_dlopen.xs, dlext=so, d_dlsymun=undef, ccdlflags='-rdynamic' cccdlflags='-fPIC', lddlflags='-shared -L/usr/local/lib' Locally applied patches: @INC for perl v5.8.8: /home/dakkar/lib/perl/i686-linux-thread-multi /home/dakkar/lib/perl /home/dakkar/lib/perl/i686-linux-thread-multi /home/dakkar/lib/perl /etc/perl /usr/lib/perl5/vendor_perl/5.8.8/i686-linux-thread-multi /usr/lib/perl5/vendor_perl/5.8.8 /usr/lib/perl5/vendor_perl /usr/lib/perl5/site_perl/5.8.8/i686-linux-thread-multi /usr/lib/perl5/site_perl/5.8.8 /usr/lib/perl5/site_perl /usr/lib/perl5/5.8.8/i686-linux-thread-multi /usr/lib/perl5/5.8.8 /usr/local/lib/site_perl . Environment for perl v5.8.8: HOME=/home/dakkar LANG (unset) LANGUAGE (unset) LC_ALL=en_US.UTF-8 LD_LIBRARY_PATH (unset) LOGDIR (unset) PATH=/usr/lib/colorgcc/bin:/usr/local/bin:/usr/bin:/bin:/opt/bin:/usr/i686-pc-linux-gnu/gcc-bin/3.4.6:/opt/stuffit/bin:/opt/sun-jdk-1.4.2.10/bin:/opt/sun-jdk-1.4.2.10/jre/bin:/opt/sun-jdk-1.4.2.10/jre/javaws:/usr/qt/3/bin:/usr/NX/bin:/home/dakkar/bin PERL5LIB=/home/dakkar/lib/perl:/home/dakkar/lib/perl: PERL5_CPANPLUS_CONFIG=/home/dakkar/.cpanplus/config PERL_BADLANG (unset) PERL_MM_OPT=INSTALLDIRS=perl LIB=/home/dakkar/lib/perl/ INSTALLBIN=/home/dakkar/bin INSTALLSCRIPT=/home/dakkar/bin/ INSTALLMAN1DIR=/home/dakkar/man/man1 INSTALLMAN3DIR=/home/dakkar/man/man3 SHELL=/bin/bash -- Dakkar - GPG public key fingerprint = A071 E618 DD2C 5901 9574 6FE2 40EA 9883 7519 3F88 key id = 0x75193F88 ```
p5pRT commented 17 years ago

From @dakkar

glob_t.tbz2

p5pRT commented 17 years ago

From doug@tierra.net

Created by doug@tierra.net

Trying to use "glob" in scalar context\, I came across two things which don't seem to do the right thing​:

######################

my $file = glob "/usr/src/*"; print "$file\n";

my $file2 = glob "/usr/src/*"; print "$file2\n";

# expected​: first 2 files from /usr/src # got​: first file from /usr/src\, twice

######################

my $filespec = "/usr/src/*"; for (1..5) {   my $file = glob $filespec;   print "$file\n";   $filespec = "/usr/local/*"; }

# expected​: first file from /usr/src\, then first 4 files from /usr/local # got​: first 5 files from /usr/src

######################

Perl Info ``` Flags: category=core severity=low Site configuration information for perl v5.8.8: Configured by brian at Fri Mar 17 07:03:18 PST 2006. Summary of my perl5 (revision 5 version 8 subversion 8) configuration: Platform: osname=freebsd, osvers=5.4-release-p10, archname=i386-freebsd-64int uname='freebsd dev.tierra.net 5.4-release-p10 freebsd 5.4-release-p10 #3: tue jan 31 15:10:25 pst 2006 root@dev.tierra.net:usrobjusrsrcsystierranet i386 ' config_args='-sde -Dprefix=/usr/local -Darchlib=/usr/local/lib/perl5/5.8.8/mach -Dprivlib=/usr/local/lib/perl5/5.8.8 -Dman3dir=/usr/local/lib/perl5/5.8.8/perl/man/man3 -Dman1dir=/usr/local/man/man1 -Dsitearch=/usr/local/lib/perl5/site_perl/5.8.8/mach -Dsitelib=/usr/local/lib/perl5/site_perl/5.8.8 -Dscriptdir=/usr/local/bin -Dsiteman3dir=/usr/local/lib/perl5/5.8.8/man/man3 -Dsiteman1dir=/usr/local/man/man1 -Ui_malloc -Ui_iconv -Uinstallusrbinperl -Dcc=cc -Duseshrplib -Dccflags=-DAPPLLIB_EXP="/usr/local/lib/perl5/5.8.8/BSDPAN" -Dotherlibdirs=/usr/home/unified/lib -Doptimize=-O -pipe -Ud_dosuid -Ui_gdbm -Dusethreads=n -Dusemymalloc=y -Duse64bitint' hint=recommended, useposix=true, d_sigaction=define usethreads=undef use5005threads=undef useithreads=undef usemultiplicity=undef useperlio=define d_sfio=undef uselargefiles=define usesocks=undef use64bitint=define use64bitall=undef uselongdouble=undef usemymalloc=y, bincompat5005=undef Compiler: cc='cc', ccflags ='-DAPPLLIB_EXP="/usr/local/lib/perl5/5.8.8/BSDPAN" -DHAS_FPSETMASK -DHAS_FLOATINGPOINT_H -fno-strict-aliasing -pipe -Wdeclaration-after-statement -I/usr/local/include', optimize='-O -pipe ', cppflags='-DAPPLLIB_EXP="/usr/local/lib/perl5/5.8.8/BSDPAN" -DHAS_FPSETMASK -DHAS_FLOATINGPOINT_H -fno-strict-aliasing -pipe -Wdeclaration-after-statement -I/usr/local/include' ccversion='', gccversion='3.4.2 [FreeBSD] 20040728', gccosandvers='' intsize=4, longsize=4, ptrsize=4, doublesize=8, byteorder=12345678 d_longlong=define, longlongsize=8, d_longdbl=define, longdblsize=12 ivtype='long long', ivsize=8, nvtype='double', nvsize=8, Off_t='off_t', lseeksize=8 alignbytes=4, prototype=define Linker and Libraries: ld='cc', ldflags =' -Wl,-E -L/usr/local/lib' libpth=/usr/lib /usr/local/lib libs=-lm -lcrypt -lutil perllibs=-lm -lcrypt -lutil libc=, so=so, useshrplib=true, libperl=libperl.so gnulibc_version='' Dynamic Linking: dlsrc=dl_dlopen.xs, dlext=so, d_dlsymun=undef, ccdlflags=' -Wl,-R/usr/local/lib/perl5/5.8.8/mach/CORE' cccdlflags='-DPIC -fPIC', lddlflags='-shared -L/usr/local/lib' Locally applied patches: defined-or @INC for perl v5.8.8: /usr/local/lib/perl5/5.8.8/BSDPAN /usr/local/lib/perl5/site_perl/5.8.8/mach /usr/local/lib/perl5/site_perl/5.8.8 /usr/local/lib/perl5/site_perl /usr/local/lib/perl5/5.8.8/mach /usr/local/lib/perl5/5.8.8 /usr/home/unified/lib . Environment for perl v5.8.8: HOME=/home/doug LANG (unset) LANGUAGE (unset) LD_LIBRARY_PATH (unset) LOGDIR (unset) PATH=/sbin:/bin:/usr/sbin:/usr/bin:/usr/games:/usr/local/sbin:/usr/local/bin:/usr/X11R6/bin:/home/doug/bin:/usr/home/unified/bin PERLCRITIC=/etc/perlcriticrc PERL_BADLANG (unset) SHELL=/usr/local/bin/bash ```
p5pRT commented 17 years ago

From @tamias

On Mon\, Oct 30\, 2006 at 11​:43​:19AM -0800\, doug @​ tierra. net wrote​:

Thank you for your bug report.

Trying to use "glob" in scalar context\, I came across two things which don't seem to do the right thing​:

######################

my $file = glob "/usr/src/*"; print "$file\n";

my $file2 = glob "/usr/src/*"; print "$file2\n";

# expected​: first 2 files from /usr/src # got​: first file from /usr/src\, twice

That is the correct behavior. These are two separate calls to glob()\, each generating its own file list.

######################

my $filespec = "/usr/src/*"; for (1..5) { my $file = glob $filespec; print "$file\n"; $filespec = "/usr/local/*"; }

# expected​: first file from /usr/src\, then first 4 files from /usr/local # got​: first 5 files from /usr/src

This is also the expected behavior. It's documented in perlop\, in the section on I/O Operators​:

  A (file)glob evaluates its (embedded) argument only when it is   starting a new list. All values must be read before it will start   over. In list context\, this isn't important because you   automatically get them all anyway. However\, in scalar context the   operator returns the next value each time it's called\, or "undef"   when the list has run out.

Ronald

p5pRT commented 17 years ago

The RT System itself - Status changed from 'new' to 'open'

p5pRT commented 17 years ago

From chromatic@wgz.org

On Monday 30 October 2006 14​:20\, Ronald J Kimball wrote​:

On Mon\, Oct 30\, 2006 at 11​:43​:19AM -0800\, doug @​ tierra. net wrote​:

######################

my $filespec = "/usr/src/*"; for (1..5) { my $file = glob $filespec; print "$file\n"; $filespec = "/usr/local/*"; }

# expected​: first file from /usr/src\, then first 4 files from /usr/local # got​: first 5 files from /usr/src

This is also the expected behavior. It's documented in perlop\, in the section on I/O Operators​:

A \(file\)glob evaluates its \(embedded\) argument only when it is
starting a new list\.  All values must be read before it will start
over\.  In list context\, this isn't important because you
automatically get them all anyway\.  However\, in scalar context the
operator returns the next value each time it's called\, or "undef"
when the list has run out\.

It's not clear to me why the second call to glob() when $filespec contains a different value does not start a new list.

-- c

p5pRT commented 17 years ago

@rgs - Status changed from 'open' to 'rejected'

p5pRT commented 17 years ago

From @davidnicol

On 10/30/06\, chromatic \chromatic@&#8203;wgz\.org wrote​:

On Monday 30 October 2006 14​:20\, Ronald J Kimball wrote​:

It's not clear to me why the second call to glob() when $filespec contains a different value does not start a new list.

-- c

What apparently happens is that the glob() function sets up an array which it iterates through\, behind the scenes\, the first time it is called. Which is what it is supposed to do. The tricky part\, to me\, would be how exactly to try and explain that succinctly in the documentation. I stared at it for a spell yesterday and gave up. The referral to File​::Glob doesn't really help that much either\, and I did not find documentation of the fact that the glob() you get by default and the glob() you have after using File​::Glob are different functions\, as the latter does not do the magic-iteration-in-scalar-context thing.

p5pRT commented 17 years ago

From @davidnicol

What apparently happens is that the glob() function sets up an array which it iterates through\, behind the scenes\, the first time it is called.

that's not exactly correct.

# mkdir BLAH; cd BLAH; touch one; touch two; touch three # perl -le 'print ~~glob($count++ or "*") while 1' | head one three two

4

6

8

#

draft document patch attached.

p5pRT commented 17 years ago

From @davidnicol

glob_pod.patch ```diff --- /usr/pkg/lib/perl5/5.8.0/pod/perlfunc.pod.old 2006-10-31 14:12:13.000000000 -0600 +++ /usr/pkg/lib/perl5/5.8.0/pod/perlfunc.pod 2006-10-31 14:15:44.000000000 -0600 @@ -2129,7 +2129,13 @@ In list context, returns a (possibly empty) list of filename expansions on the value of EXPR such as the standard Unix shell F would do. In scalar context, glob iterates through such filename expansions, returning -undef when the list is exhausted. This is the internal function +undef when the list is exhausted. + +The iterator is loaded only when the list is exhausted, which will +cause repeated calls to an instance of C in scalar context +to function as a kind of flip-flop. + +This is the internal function implementing the C<< <*.c> >> operator, but you can use it directly. If EXPR is omitted, C<$_> is used. The C<< <*.c> >> operator is discussed in more detail in L. ```
p5pRT commented 17 years ago

From @davidnicol

another possibility.

p5pRT commented 17 years ago

From @davidnicol

glob_pod_2.patch ```diff --- /usr/pkg/lib/perl5/5.8.0/pod/perlfunc.pod.old 2006-10-31 14:12:13.000000000 -0600 +++ /usr/pkg/lib/perl5/5.8.0/pod/perlfunc.pod 2006-10-31 14:29:38.000000000 -0600 @@ -2129,7 +2129,14 @@ In list context, returns a (possibly empty) list of filename expansions on the value of EXPR such as the standard Unix shell F would do. In scalar context, glob iterates through such filename expansions, returning -undef when the list is exhausted. This is the internal function +undef when the list is exhausted. + +EXPR is evaluated and the iterator loaded only when the list is exhausted, +which may cause surprising results when you change EXPR while there +are results queued from an earlier execution of a particular C +instance. + +This is the internal function implementing the C<< <*.c> >> operator, but you can use it directly. If EXPR is omitted, C<$_> is used. The C<< <*.c> >> operator is discussed in more detail in L. ```
p5pRT commented 17 years ago

From @tamias

On Tue\, Oct 31\, 2006 at 02​:21​:45PM -0600\, David Nicol wrote​:

What apparently happens is that the glob() function sets up an array which it iterates through\, behind the scenes\, the first time it is called.

that's not exactly correct.

# mkdir BLAH; cd BLAH; touch one; touch two; touch three # perl -le 'print ~~glob($count++ or "*") while 1' | head one three two

4

6

8

#

draft document patch attached.

+The iterator is loaded only when the list is exhausted\, which will +cause repeated calls to an instance of C\<glob("1")> in scalar context +to function as a kind of flip-flop.

The first time the iterator is loaded\, there's no list that has been exhausted. I'm also not sure this example makes the behavior clearer. It assumes that the reader understands what glob("1") does and knows what a flip-flop is.

How does this sound?

  Once the glob() has begun\, the pattern argument is ignored until after   the current list has been exhausted\, even if the pattern changes in the   meantime.

Possibly with an example​:

  my $pattern = '*.txt';   while (my $file = glob($pattern)) {   print "$file\n";   $pattern = '*.jpg';   }

  Even though the pattern changes to '*.jpg' after the first file is   returned\, the glob() will continue returning files ending in .txt.

Ronald

p5pRT commented 17 years ago

From @davidnicol

There's also the possibility of making a very small addition to the entry\, just one more dot to connect into the reader's picture\, such as merely changing

  In scalar context\, glob iterates through   such filename expansions\, returning undef when the list is   exhausted.

to

  In scalar context\, glob iterates through   such filename expansions\, returning undef when the list is   exhausted\, after which EXPR will be evaluated again on   the next call.

Which concisely implies that EXPR is not evaluated while the iterator is loaded.

That doesn't address the issue of glob instances in code\, which is the root of the OP's understandable confusion. Revising glob entirely to use a table of active iterators keyed by the string value of (defined(EXPR)?(EXPR)​:$_) might make it more intuitive but would be a big change to the current semantics compared with one iterator for each appearance of glob in Perl code.

I have been calling that appearance an "instance" but I'm not sure if that is the correct term...

There's also the issue of the iterative semantics going away after

  use File​::Glob '​:glob';

which was also unexpected and could be addressed in the docs.

p5pRT commented 17 years ago

From @tamias

On Tue\, Oct 31\, 2006 at 04​:21​:51PM -0600\, David Nicol wrote​:

There's also the possibility of making a very small addition to the entry\, just one more dot to connect into the reader's picture\, such as merely changing

          In scalar context\, glob iterates through
          such filename expansions\, returning undef when the list is
          exhausted\.

to

          In scalar context\, glob iterates through
          such filename expansions\, returning undef when the list is
          exhausted\, after which EXPR will be evaluated again on
          the next call\.

Which concisely implies that EXPR is not evaluated while the iterator is loaded.

I don't think it's right to say that EXPR is not evaluated. As your example showed\, EXPR is always evaluated. It's just that glob() will ignore the result.

Ronald

p5pRT commented 17 years ago

From @schwern

Ronald J Kimball wrote​:

Once the glob() has begun\, the pattern argument is ignored until after the current list has been exhausted\, even if the pattern changes in the meantime.

I know I'll get a spanking from the backwards compatibility police\, but...

Instead of documenting a weird and dangerous behavior\, why don't we fix it so it Does The Right Thing? Simply store the file pattern from which the list is generated. If a new pattern is passed in generate a new list.

The current behavior of glob() seems accidentally inherited from \<*.c> where it wasn't possible for the pattern of the op to change. When glob() came into being and variable globs became possible\, this behavior was discovered it was documented rather than fixed.

FWIW I just got bitten by overzealous glob caching (though of a different type) not half an hour ago after writing this in a subroutine...

  sub has_tests {   my $has_tests = glob("t/*.t") ? 1 : 0;

  return $has_tests;   }

p5pRT commented 17 years ago

From @davidnicol

On 10/31/06\, Ronald J Kimball \rjk\-perl\-p5p@&#8203;tamias\.net wrote​:

I don't think it's right to say that EXPR is not evaluated. As your example showed\, EXPR is always evaluated. It's just that glob() will ignore the result.

Right you are. Maybe I should have stopped after I stared at it for a while and couldn't make headway without increasing its size substantially.

untested​:

package Improved​::glob; use File​::Glob (); my %cache; sub glob(_){   exists $cache{$_[0]} and   return shift @​{   (@​{$cache{$_[0]} > 1) ?   $cache{$_[0]} :   (delete $cache{$_[0]})};   $cache{$_[0]} = [File​::Glob​::bsd_glob($_[0])];   return shift @​{$cache{$_[0]}}; };

sub import {*{caller().'​::glob'} = \&glob}; 1; __END__

};