Perl / perl5

🐪 The Perl programming language
https://dev.perl.org/perl5/
Other
1.98k stars 559 forks source link

File::Glob ':glob' causes infinite loop #11542

Closed p5pRT closed 13 years ago

p5pRT commented 13 years ago

Migrated from rt.perl.org#96116 (status was 'resolved')

Searchable as RT96116$

p5pRT commented 13 years ago

From jpl@research.att.com

Created by jpl@research.att.com

For background\, please visit http​://www.perlmonks.org/?node_id=917494

A user there could not use the standard glob\, because a directory contained blanks. When he inserted

use File​::Glob '​:glob';

his

while (glob "$folder/*.txt")

looped forever. I initially argued this was not a bug\, because bsd_glob only promised to return a list\, and a list in scalar context just chucks all but the last entry\, which is "true" in this case. But as I tried to understand how to make it behave like CORE clob in scalar context\, I looked at ext/File-Glob/t/global.t\, and the tests there look like File​::Glob is expected to behave exactly like core glob. ('​:glob' is not tested there).

At the very least\, the incompatible behavior of '​:glob' in scalar context should be documented. Ideally\, it should be made to behave like core glob. But that means that anyone using '​:glob' doing

$last_entry = glob($pattern);

to get just the last matching entry would start getting the first matching entry. So changing the behavior to be closer to the core *could* break existing code. Perhaps a new tag should be used to solicit core-like behavior\, as another monk suggested. Here's a simple script to reproduce the problem​:

  #!/usr/bin/perl -w

  use strict;   # use File​::Glob '​:glob';

  my $dir = shift || "/etc";   my $max = shift || 1000;   my $entries = 0;   while (glob("$dir/*")) {   last if (++$entries > $max);   }   print($entries\, " items under $dir\n");

Run it\, uncomment the use File​::Glob '​:glob'\, and rerun it (perhaps with a reduced count\, so it doesn't take too long... the loop is not only potentially infinite\, each iteration is sloooow).

Perl Info ``` Flags: category=library severity=medium module=File::Glob Site configuration information for perl 5.10.1: Configured by Debian Project at Tue Apr 26 15:53:22 UTC 2011. Summary of my perl5 (revision 5 version 10 subversion 1) configuration: Platform: osname=linux, osvers=2.6.24-29-server, archname=x86_64-linux-gnu-thread-multi uname='linux crested 2.6.24-29-server #1 smp wed mar 16 19:04:28 utc 2011 x86_64 x86_64 x86_64 gnulinux ' config_args='-Dusethreads -Duselargefiles -Dccflags=-DDEBIAN -Dcccdlflags=-fPIC -Darchname=x86_64-linux-gnu -Dprefix=/usr -Dprivlib=/usr/share/perl/5.10 -Darchlib=/usr/lib/perl/5.10 -Dvendorprefix=/usr -Dvendorlib=/usr/share/perl5 -Dvendorarch=/usr/lib/perl5 -Dsiteprefix=/usr/local -Dsitelib=/usr/local/share/perl/5.10.1 -Dsitearch=/usr/local/lib/perl/5.10.1 -Dman1dir=/usr/share/man/man1 -Dman3dir=/usr/share/man/man3 -Dsiteman1dir=/usr/local/man/man1 -Dsiteman3dir=/usr/local/man/man3 -Dman1ext=1 -Dman3ext=3perl -Dpager=/usr/bin/sensible-pager -Uafs -Ud_csh -Ud_ualarm -Uusesfio -Uusenm -DDEBUGGING=-g -Doptimize=-O2 -Dplibpth=/lib/x86_64-linux-gnu /usr/lib/x86_64-linux-gnu -Duseshrplib -Dlibperl=libperl.so.5.10.1 -Dd_dosuid -des' hint=recommended, useposix=true, d_sigaction=define useithreads=define, usemultiplicity=define useperlio=define, d_sfio=undef, uselargefiles=define, usesocks=undef use64bitint=define, use64bitall=define, uselongdouble=undef usemymalloc=n, bincompat5005=undef Compiler: cc='cc', ccflags ='-D_REENTRANT -D_GNU_SOURCE -DDEBIAN -fno-strict-aliasing -pipe -fstack-protector -I/usr/local/include -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64', optimize='-O2 -g', cppflags='-D_REENTRANT -D_GNU_SOURCE -DDEBIAN -fno-strict-aliasing -pipe -fstack-protector -I/usr/local/include' ccversion='', gccversion='4.5.2', gccosandvers='' intsize=4, longsize=8, ptrsize=8, doublesize=8, byteorder=12345678 d_longlong=define, longlongsize=8, d_longdbl=define, longdblsize=16 ivtype='long', ivsize=8, nvtype='double', nvsize=8, Off_t='off_t', lseeksize=8 alignbytes=8, prototype=define Linker and Libraries: ld='cc', ldflags =' -fstack-protector -L/usr/local/lib' libpth=/usr/local/lib /lib/x86_64-linux-gnu /usr/lib/x86_64-linux-gnu /lib /usr/lib /lib64 /usr/lib64 libs=-lgdbm -lgdbm_compat -ldb -ldl -lm -lpthread -lc -lcrypt perllibs=-ldl -lm -lpthread -lc -lcrypt libc=, so=so, useshrplib=true, libperl=libperl.so.5.10.1 gnulibc_version='2.13' Dynamic Linking: dlsrc=dl_dlopen.xs, dlext=so, d_dlsymun=undef, ccdlflags='-Wl,-E' cccdlflags='-fPIC', lddlflags='-shared -O2 -g -L/usr/local/lib -fstack-protector' Locally applied patches: DEBPKG:debian/arm_thread_stress_timeout - http://bugs.debian.org/501970 Raise the timeout of ext/threads/shared/t/stress.t to accommodate slower build hosts DEBPKG:debian/cpan_config_path - Set location of CPAN::Config to /etc/perl as /usr may not be writable. DEBPKG:debian/cpan_definstalldirs - Provide a sensible INSTALLDIRS default for modules installed from CPAN. DEBPKG:debian/db_file_ver - http://bugs.debian.org/340047 Remove overly restrictive DB_File version check. DEBPKG:debian/doc_info - Replace generic man(1) instructions with Debian-specific information. DEBPKG:debian/enc2xs_inc - http://bugs.debian.org/290336 Tweak enc2xs to follow symlinks and ignore missing @INC directories. DEBPKG:debian/errno_ver - http://bugs.debian.org/343351 Remove Errno version check due to upgrade problems with long-running processes. DEBPKG:debian/extutils_hacks - Various debian-specific ExtUtils changes DEBPKG:debian/fakeroot - Postpone LD_LIBRARY_PATH evaluation to the binary targets. DEBPKG:debian/instmodsh_doc - Debian policy doesn't install .packlist files for core or vendor. DEBPKG:debian/ld_run_path - Remove standard libs from LD_RUN_PATH as per Debian policy. DEBPKG:debian/libnet_config_path - Set location of libnet.cfg to /etc/perl/Net as /usr may not be writable. DEBPKG:debian/m68k_thread_stress - http://bugs.debian.org/495826 Disable some threads tests on m68k for now due to missing TLS. DEBPKG:debian/mod_paths - Tweak @INC ordering for Debian DEBPKG:debian/module_build_man_extensions - http://bugs.debian.org/479460 Adjust Module::Build manual page extensions for the Debian Perl policy DEBPKG:debian/perl_synopsis - http://bugs.debian.org/278323 Rearrange perl.pod DEBPKG:debian/prune_libs - http://bugs.debian.org/128355 Prune the list of libraries wanted to what we actually need. DEBPKG:debian/use_gdbm - Explicitly link against -lgdbm_compat in ODBM_File/NDBM_File. DEBPKG:fixes/assorted_docs - http://bugs.debian.org/443733 [384f06a] Math::BigInt::CalcEmu documentation grammar fix DEBPKG:fixes/net_smtp_docs - http://bugs.debian.org/100195 [rt.cpan.org #36038] Document the Net::SMTP 'Port' option DEBPKG:fixes/processPL - http://bugs.debian.org/357264 [rt.cpan.org #17224] Always use PERLRUNINST when building perl modules. DEBPKG:debian/perlivp - http://bugs.debian.org/510895 Make perlivp skip include directories in /usr/local DEBPKG:fixes/pod2man-index-backslash - http://bugs.debian.org/521256 Escape backslashes in .IX entries DEBPKG:debian/disable-zlib-bundling - Disable zlib bundling in Compress::Raw::Zlib DEBPKG:fixes/kfreebsd_cppsymbols - http://bugs.debian.org/533098 [3b910a0] Add gcc predefined macros to $Config{cppsymbols} on GNU/kFreeBSD. DEBPKG:debian/cpanplus_definstalldirs - http://bugs.debian.org/533707 Configure CPANPLUS to use the site directories by default. DEBPKG:debian/cpanplus_config_path - Save local versions of CPANPLUS::Config::System into /etc/perl. DEBPKG:fixes/kfreebsd-filecopy-pipes - http://bugs.debian.org/537555 [16f708c] Fix File::Copy::copy with pipes on GNU/kFreeBSD DEBPKG:fixes/anon-tmpfile-dir - http://bugs.debian.org/528544 [perl #66452] Honor TMPDIR when open()ing an anonymous temporary file DEBPKG:fixes/abstract-sockets - http://bugs.debian.org/329291 [89904c0] Add support for Abstract namespace sockets. DEBPKG:fixes/hurd_cppsymbols - http://bugs.debian.org/544307 [eeb92b7] Add gcc predefined macros to $Config{cppsymbols} on GNU/Hurd. DEBPKG:fixes/autodie-flock - http://bugs.debian.org/543731 Allow for flock returning EAGAIN instead of EWOULDBLOCK on linux/parisc DEBPKG:fixes/archive-tar-instance-error - http://bugs.debian.org/539355 [rt.cpan.org #48879] Separate Archive::Tar instance error strings from each other DEBPKG:fixes/positive-gpos - http://bugs.debian.org/545234 [perl #69056] [c584a96] Fix \\G crash on first match DEBPKG:debian/devel-ppport-ia64-optim - http://bugs.debian.org/548943 Work around an ICE on ia64 DEBPKG:fixes/trie-logic-match - http://bugs.debian.org/552291 [perl #69973] [0abd0d7] Fix a DoS in Unicode processing [CVE-2009-3626] DEBPKG:fixes/hppa-thread-eagain - http://bugs.debian.org/554218 make the threads-shared test suite more robust, fixing failures on hppa DEBPKG:fixes/crash-on-undefined-destroy - http://bugs.debian.org/564074 [perl #71952] [1f15e67] Fix a NULL pointer dereference when looking for a DESTROY method DEBPKG:fixes/tainted-errno - http://bugs.debian.org/574129 [perl #61976] [be1cf43] fix an errno stringification bug in taint mode DEBPKG:fixes/safe-upgrade - http://bugs.debian.org/582978 Upgrade Safe.pm to 2.25, fixing CVE-2010-1974 DEBPKG:fixes/tell-crash - http://bugs.debian.org/578577 [f4817f3] Fix a tell() crash on bad arguments. DEBPKG:fixes/format-write-crash - http://bugs.debian.org/579537 [perl #22977] [421f30e] Fix a crash in format/write DEBPKG:fixes/arm-alignment - http://bugs.debian.org/289884 [f1c7503] Prevent gcc from optimizing the alignment test away on armel DEBPKG:fixes/fcgi-test - Fix a failure in CGI/t/fast.t when FCGI is installed DEBPKG:fixes/hurd-ccflags - http://bugs.debian.org/587901 Make hints/gnu.sh append to $ccflags rather than overriding them DEBPKG:debian/squelch-locale-warnings - http://bugs.debian.org/508764 Squelch locale warnings in Debian package maintainer scripts DEBPKG:fixes/lc-numeric-docs - http://bugs.debian.org/379329 [perl #78452] [903eb63] LC_NUMERIC documentation fixes DEBPKG:fixes/lc-numeric-sprintf - http://bugs.debian.org/601549 [perl #78632] [b3fd614] Fix sprintf not to ignore LC_NUMERIC with constants DEBPKG:fixes/concat-stack-corruption - http://bugs.debian.org/596105 [perl #78674] [e3393f5] Fix stack pointer corruption in pp_concat() with 'use encoding' DEBPKG:fixes/cgi-multiline-header - http://bugs.debian.org/606995 [CVE-2010-2761 CVE-2010-4410 CVE-2010-4411] CGI.pm MIME boundary and multiline header vulnerabilities DEBPKG:fixes/h2ph-gcc-4.5 - http://bugs.debian.org/599933 [8d66b3f] Fix h2ph and test DEBPKG:fixes/threads-tmps-crash - [perl #70411] [24855df] Conditionally compile tmps stack cleanup code DEBPKG:patchlevel - http://bugs.debian.org/567489 List packaged patches for 5.10.1-17ubuntu1 in patchlevel.h @INC for perl 5.10.1: /etc/perl /usr/local/lib/perl/5.10.1 /usr/local/share/perl/5.10.1 /usr/lib/perl5 /usr/share/perl5 /usr/lib/perl/5.10 /usr/share/perl/5.10 /usr/local/lib/site_perl . Environment for perl 5.10.1: HOME=/home/jpl LANG=C LANGUAGE=en_US.iso88591:en LD_LIBRARY_PATH (unset) LOGDIR (unset) PATH=/home/jpl/x86_64bin:/home/jpl/bin:/bin:/usr/bin:/usr/sbin:/sbin:/usr/local/bin:/usr/games:.:/home/jpl/wordnet/bin:/home/jpl/U/Production/bin PERL_BADLANG (unset) SHELL=/bin/ksh ```
p5pRT commented 13 years ago

From @epa

Or instead might it make sense to fix the core glob() so that it doesn't break with filenames that contain spaces?

p5pRT commented 13 years ago

The RT System itself - Status changed from 'new' to 'open'

p5pRT commented 13 years ago

From jpl@research.att.com

Unfortunately\, that's a "feature"\, not a "bug". Spaces delimit separate patterns\, and there must be a ton of scripts that rely on that behavior ("*.c *.h") -- jpl

On 08/03/11 10​:09\, Ed Avis via RT wrote​:

Or instead might it make sense to fix the core glob() so that it doesn't break with filenames that contain spaces?

p5pRT commented 13 years ago

From @davidnicol

perhaps null-delim option could be introduced\, in keeping with the

  find ... -print0 | xargs --null ...

style of dealing with spaces?

On Wed\, Aug 3\, 2011 at 9​:21 AM\, John P. Linderman (jpl) \jpl@​research\.att\.com wrote​:

Unfortunately\, that's a "feature"\, not a "bug".  Spaces delimit separate patterns\, and there must be a ton of scripts that rely on that behavior ("*.c *.h") -- jpl

p5pRT commented 13 years ago

From @epa

John P. Linderman (jpl \<jpl \ research.att.com> writes​:

Unfortunately\, that's a "feature"\, not a "bug". Spaces delimit separate patterns\, and there must be a ton of scripts that rely on that behavior ("*.c *.h")

There must be lots of programs that rely on that. But are they not outnumbered by the programs which rely on glob working for general filenames\, e.g.

  $wildcard = shift @​ARGV;   @​files = glob($wildcard);

OK\, so that code is broken and has always been broken. But it's not uncommon\, as a quick search turns up​:

  \<http​://www.google.com/codesearch#search/&q=glob\%28\$%20lang​:^perl$&type=cs>

It's all rather unfortunate\, since the '*.c *.h' behaviour can easily be had by doing (glob('*.c')\, glob('*.h')) instead\, and was always more of an accidental side-effect of calling out to tcsh for the globbing.

-- Ed Avis \eda@&#8203;waniasset\.com

p5pRT commented 13 years ago

From jpl@research.att.com

Looks like a reasonable possibility.

I don't know if the anonymous monk in

  http​://www.perlmonks.org/?node_id=917494

was a jerk or a genius when he said

/making bsd_glob DWIM in scalar context is trivial (just copy/paste from csh_glob)/

I poked around in op.c (to no avail). But someone who knew what they were doing (that would *not* be me) might be able to make some headway with ext/File-Glob/Glob.pm. Perhaps what is being done there for csh_glob could be duplicated for a bsd_glob equivalent. I think I *might* be able to figure something out\, but someone familiar with Glob.pm could probably do it (or rule it out) in a heartbeat or two. -- jpl

On 08/03/11 11​:35\, David Nicol wrote​:

perhaps null-delim option could be introduced\, in keeping with the

find \.\.\. \-print0 | xargs \-\-null \.\.\.

style of dealing with spaces?

On Wed\, Aug 3\, 2011 at 9​:21 AM\, John P. Linderman (jpl) \jpl@&#8203;research\.att\.com wrote​:

Unfortunately\, that's a "feature"\, not a "bug". Spaces delimit separate patterns\, and there must be a ton of scripts that rely on that behavior ("*.c *.h") -- jp

p5pRT commented 13 years ago

From @tux

On Wed\, 3 Aug 2011 15​:39​:57 +0000 (UTC)\, Ed Avis \eda@&#8203;waniasset\.com wrote​:

John P. Linderman (jpl \<jpl \ research.att.com> writes​:

Unfortunately\, that's a "feature"\, not a "bug". Spaces delimit separate patterns\, and there must be a ton of scripts that rely on that behavior ("*.c *.h")

There must be lots of programs that rely on that. But are they not outnumbered by the programs which rely on glob working for general filenames\, e.g.

$wildcard = shift @&#8203;ARGV;
@&#8203;files = glob\($wildcard\);

Which is documented not to be reliable. I'm 100% with John here. The only thing I changed over the past few years is

  \<*.h *.c >

=>

  glob ("*.h *.c");

OK\, so that code is broken and has always been broken. But it's not uncommon\, as a quick search turns up​:

\<http​://www.google.com/codesearch#search/&q=glob\%28\$%20lang​:^perl$&type=cs>

It's all rather unfortunate\, since the '*.c *.h' behaviour can easily be had by doing (glob('*.c')\, glob('*.h')) instead\, and was always more of an accidental

Which is even safer\, but also requires more typing and causes noise

side-effect of calling out to tcsh for the globbing.

-- H.Merijn Brand http​://tux.nl Perl Monger http​://amsterdam.pm.org/ using 5.00307 through 5.14 and porting perl5.15.x on HP-UX 10.20\, 11.00\, 11.11\, 11.23 and 11.31\, OpenSuSE 10.1\, 11.0 .. 11.4 and AIX 5.2 and 5.3. http​://mirrors.develooper.com/hpux/ http​://www.test-smoke.org/ http​://qa.perl.org http​://www.goldmark.org/jeff/stupid-disclaimers/

p5pRT commented 13 years ago

From jpl@research.att.com

On 08/03/11 11​:35\, David Nicol wrote​:

perhaps null-delim option could be introduced\, in keeping with the

find \.\.\. \-print0 | xargs \-\-null \.\.\.

style of dealing with spaces?

On Wed\, Aug 3\, 2011 at 9​:21 AM\, John P. Linderman (jpl) \jpl@&#8203;research\.att\.com wrote​:

Unfortunately\, that's a "feature"\, not a "bug". Spaces delimit separate patterns\, and there must be a ton of scripts that rely on that behavior ("*.c *.h") -- jpl

Well. Here's a proof of concept run with a modified 5.14.1​:

  PERL5LIB=/home/jpl/src/perl-5.14.1/lib ./perl -de 0

  Loading DB routines from perl5db.pl version 1.33   Editor support available.

  Enter h or `h h' for help\, or `man perldebug' for more help.

  main​::(-e​:1)​: 0   DB\<1> use File​::Glob '​:obglay'

  DB\<2> print(scalar(glob('/tmp/dir with spaces/*'))\, "\n") for (1   .. 3)   /tmp/dir with spaces/a   /tmp/dir with spaces/b   /tmp/dir with spaces/c

  DB\<3> print($_\, "\n") while (glob('/tmp/dir with spaces/*'))   /tmp/dir with spaces/a   /tmp/dir with spaces/b   /tmp/dir with spaces/c

Obviously\, "obglay" (pig latin for glob\, and a near anagram of "globally") isn't a suitable tag. Something like '​:space_ok' would be better.) Same for the flag names I'm attaching. Maybe SPACE_OK rather than NOBLANKS. And I hate to use the top bit for NOBLANKS/SPACE_OK\, but that was the only bit left. After I made the changes\, a full

make test

ran successfully. But please have somebody knowledgable gaze at the changes (under ext/File-Glob)! I'm not really comfortable hacking the core. -- jpl

p5pRT commented 13 years ago

From jpl@research.att.com

*** Makefile.PL.orig Wed Aug 3 13​:15​:58 2011 --- Makefile.PL Wed Aug 3 13​:25​:46 2011 *************** WriteConstants( *** 26\,32 ****   NAME => 'File​::Glob'\,   NAMES => [qw(GLOB_ABEND GLOB_ALPHASORT GLOB_ALTDIRFUNC GLOB_BRACE GLOB_ERR   GLOB_LIMIT GLOB_MARK GLOB_NOCASE GLOB_NOCHECK GLOB_NOMAGIC ! GLOB_NOSORT GLOB_NOSPACE GLOB_QUOTE GLOB_TILDE)\,   {name => 'GLOB_CSH'\,   value => 'GLOB_BRACE|GLOB_NOMAGIC|GLOB_QUOTE|GLOB_TILDE|GLOB_ALPHASORT'\,   macro => 1}\, --- 26\,32 ----   NAME => 'File​::Glob'\,   NAMES => [qw(GLOB_ABEND GLOB_ALPHASORT GLOB_ALTDIRFUNC GLOB_BRACE GLOB_ERR   GLOB_LIMIT GLOB_MARK GLOB_NOCASE GLOB_NOCHECK GLOB_NOMAGIC ! GLOB_NOSORT GLOB_NOSPACE GLOB_QUOTE GLOB_TILDE GLOB_NOBLANKS)\,   {name => 'GLOB_CSH'\,   value => 'GLOB_BRACE|GLOB_NOMAGIC|GLOB_QUOTE|GLOB_TILDE|GLOB_ALPHASORT'\,   macro => 1}\, *** bsd_glob.h.orig Wed Aug 3 13​:01​:25 2011 --- bsd_glob.h Wed Aug 3 13​:03​:07 2011 *************** typedef struct { *** 76\,81 **** --- 76\,82 ----   #define GLOB_ALPHASORT 0x2000 /* Alphabetic\, not ASCII sort\, like csh. */   #define GLOB_LIMIT 0x4000 /* Limit pattern match output to ARG_MAX   (usually from limits.h). */ + #define GLOB_NOBLANKS 0x8000 /* space is NOT a pattern delimiter */  
  #define GLOB_NOSPACE (-1) /* Malloc call failed. */   #define GLOB_ABEND (-2) /* Unignored error. */ *** Glob.pm.orig Wed Aug 3 12​:23​:49 2011 --- Glob.pm Wed Aug 3 13​:09​:31 2011 *************** sub import { *** 46\,51 **** --- 46\,52 ----   given ($_) {   $DEFAULT_FLAGS &= ~GLOB_NOCASE() when '​:case';   $DEFAULT_FLAGS |= GLOB_NOCASE() when '​:nocase'; + $DEFAULT_FLAGS |= GLOB_NOBLANKS() when '​:obglay';   when ('​:globally') {   no warnings 'redefine';   *CORE​::GLOBAL​::glob = \&File​::Glob​::csh_glob; *************** sub csh_glob { *** 86\,92 ****   $pat =~ s/^\s+//; # Protect against empty elements in   $pat =~ s/\s+$//; # things like \< *.c> and \<*.c >.   # These alone shouldn't trigger ParseWords. ! if ($pat =~ /\s/) {   # XXX this is needed for compatibility with the csh   # implementation in Perl. Need to support a flag   # to disable this behavior. --- 87\,93 ----   $pat =~ s/^\s+//; # Protect against empty elements in   $pat =~ s/\s+$//; # things like \< *.c> and \<*.c >.   # These alone shouldn't trigger ParseWords. ! if (($pat =~ /\s/) && !($DEFAULT_FLAGS & GLOB_NOBLANKS())) {   # XXX this is needed for compatibility with the csh   # implementation in Perl. Need to support a flag   # to disable this behavior.

p5pRT commented 13 years ago

From sidhekin@allverden.no

On Wed\, Aug 3\, 2011 at 7​:47 PM\, John P. Linderman (jpl) \< jpl@​research.att.com> wrote​:

Obviously\, "obglay" (pig latin for glob\, and a near anagram of "globally") isn't a suitable tag. Something like '​:space_ok' would be better.

  By which you mean "space is not a delimiter". How about telling what the delimiter *is* ("​:null_delim")\, if *any* ("​:no_delim")? ... it's the latter you've implemented\, right?

Eirik

p5pRT commented 13 years ago

From jpl@research.att.com

On 08/03/11 13​:59\, The Sidhekin wrote​:

On Wed\, Aug 3\, 2011 at 7​:47 PM\, John P. Linderman (jpl) \<jpl@​research.att.com \mailto&#8203;:jpl@&#8203;research\.att\.com> wrote​:

Obviously\, "obglay" \(pig latin for glob\, and a near anagram of
"globally"\) isn't a suitable tag\.  Something like '&#8203;:space\_ok'
would be better\.

By which you mean "space is not a delimiter". How about telling what the delimiter /is/ ("​:null_delim")\, if /any/ ("​:no_delim")? ... it's the latter you've implemented\, right?

Eirik Or "​:single_pattern". We are now down to the color of the bike shed\, as far as I am concerned​:-)

  http​://en.wikipedia.org/wiki/Parkinson%27s_Law_of_Triviality

I'm quite satisfied to have come up with what appears to be a workable fix. I'd prefer to leave the details to someone better qualified\, but I'll cook up a complete package of changes (including tests and documentation changes) if you like. I leave on a week's vacation Friday\, so I may not have the time to get it done until mid month.

p5pRT commented 13 years ago

From jpl@research.att.com

On 08/03/11 14​:07\, John P. Linderman (jpl) wrote​:

On 08/03/11 13​:59\, The Sidhekin wrote​:

On Wed\, Aug 3\, 2011 at 7​:47 PM\, John P. Linderman (jpl) \<jpl@​research.att.com \mailto&#8203;:jpl@&#8203;research\.att\.com> wrote​:

Obviously\, "obglay" \(pig latin for glob\, and a near anagram of
"globally"\) isn't a suitable tag\.  Something like '&#8203;:space\_ok'
would be better\.

By which you mean "space is not a delimiter". How about telling what the delimiter /is/ ("​:null_delim")\, if /any/ ("​:no_delim")? ... it's the latter you've implemented\, right?

Eirik Or "​:single_pattern". We are now down to the color of the bike shed\, as far as I am concerned​:-)

http​://en.wikipedia.org/wiki/Parkinson%27s_Law_of_Triviality

I'm quite satisfied to have come up with what appears to be a workable fix. I'd prefer to leave the details to someone better qualified\, but I'll cook up a complete package of changes (including tests and documentation changes) if you like. I leave on a week's vacation Friday\, so I may not have the time to get it done until mid month. I guess one serious consideration about the choice of names is that it be "reversible" like '​:case' and '​:nocase' (which the diffs I sent don't support). So something where a leading "no" also makes sense would be desirable. -- jpl

p5pRT commented 13 years ago

From jpl@research.att.com

Ed Avis \eda@&#8203;waniasset\.com noted​:

Unfortunately\, that's a "feature"\, not a "bug"\.  Spaces delimit separate
>patterns\, and there must be a ton of scripts that rely on that behavior
>\("\*\.c \*\.h"\)

  There must be lots of programs that rely on that. But are they not outnumbered   by the programs which rely on glob working for general filenames\, e.g.

  $wildcard = shift @​ARGV;   @​files = glob($wildcard);

  OK\, so that code is broken and has always been broken. But it's not uncommon\,   as a quick search turns up​:

  \<http​://www.google.com/codesearch#search/&q=glob\%28\$%20lang​:^perl$&type=cs>

  It's all rather unfortunate\, since the '*.c *.h' behaviour can easily be had by   doing (glob('*.c')\, glob('*.h')) instead\, and was always more of an accidental   side-effect of calling out to tcsh for the globbing.

One reason this bug has gone (relatively) unnoticed for so long is that it doesn't happen in list context. So your example and many of the examples you cite will work just fine. It's scalar context that triggers the problem. There's some magic that I don't yet fully understand (and won't until I get back from vacation) that hands glob a second argument that can be used to identify which invocation is in play\, so state can be maintained. It's there (using -MO=Deparse) when '​:glob' is used

  use File​::Glob ('​:glob');   while (defined($_ = glob("$dir/*"\, 0))) {   last if ++$entries > $max;   }   while (defined($_ = glob("$dir/*"\, 1))) {   last if ++$entries > $max;   }

but not when it is missing

  use File​::Glob ('​:globally');   while (defined($_ = glob("$dir/*"))) {   last if ++$entries > $max;   }   while (defined($_ = glob("$dir/*"))) {   last if ++$entries > $max;   }

That seems totally backwards; it seems like the extra argument is essential for maintaining state to make scalar context act like an iterator\, but it's '​:glob' that is broken and everything else works. My diffs point how to make things work with embedded space\, but I'm not satisfied\, because I don't really understand what's happening. Anyone should feel free to fix things while I'm vacationing​:-) . -- jpl

p5pRT commented 13 years ago

From @epa

I guess that although glob("$dirname/*") breaks for directory names containing whitespace\, it also breaks for those that contain the * or { or } characters\, which are equally possible at least on Unix.

Perhaps there is a case for making glob taint-check its argument under -T? After all\, the only truly safe way to use it is something like

  my $dirname = shift @​ARGV;   die "directory name '$dirname' contains bad chars\, cannot glob"   if $dirname =~ tr/*{} //;   my @​g = glob "$dirname/*";

But who does that? What tutorial would teach it as a safe habit? Or should it be

  my $dirname = shift @​ARGV;   my $quoted = quotemeta $dirname;   my @​g = glob "$quoted/*";

That seems dirty\, and won't work under Windows\, where glob() takes backslash as a directory separator.

So we have to conclude that glob() for a user-supplied string is never going to work reliably for all possible inputs. It's fine for configuration files and the like\, where you can insist that non-word characters aren't allowed in filenames\, but it's never going to work for arbitrary filenames in all their messiness. Given that\, it doesn't seem such a big deal that it does something funny with the space character. Programs that have that bug would still break when given filenames containing *{}\, even if the behaviour with spaces were fixed.

It's analogous to the situation with two-argument open. The only clean and safe way would be to avoid string parsing and use multiple arguments instead​:

  @​g = safe_glob literal($dirname)\, dir_sep()\, wildcard('*');

That would be unlikely to catch on because "$dirname/*" is so much more convenient to write.

-- Ed Avis \eda@&#8203;waniasset\.com

p5pRT commented 13 years ago

From @davidnicol

On Wed\, Aug 3\, 2011 at 1​:07 PM\, John P. Linderman (jpl) \jpl@&#8203;research\.att\.com wrote​:

On 08/03/11 13​:59\, The Sidhekin wrote​:

On Wed\, Aug 3\, 2011 at 7​:47 PM\, John P. Linderman (jpl) \jpl@&#8203;research\.att\.com wrote​:

  By which you mean "space is not a delimiter".  How about telling what the delimiter is ("​:null_delim")\, if any ("​:no_delim")?  ... it's the latter you've implemented\, right? Or "​:single_pattern".  We are now down to the color of the bike shed\, as far as I am concerned :-)

Nah\, still engineering the roof and walls.

How about a new LNV (Line Noise Variable) (with a slot in the hints table saying what it defaults to) identifying what the delimiter is. A LNV is not needed; something more verbose...

  my @​SourceFilesWithSpaceAtNameEndList = do { local $Glob​::delim = "#'; glob("* .h#* .c")};

perldoc File​::Glob says

  Since v5.6.0\, Perl's CORE​::glob() is implemented in terms of   bsd_glob(). Note that they don't share the same   prototype--CORE​::glob() only accepts a single argument. Due to   historical reasons\, CORE​::glob() will also split its argument on   whitespace\, treating it as multiple patterns\, whereas bsd_glob()   considers them as one pattern.

How about\, when CORE​::glob gets more than one argument\, it doesn't split them?

  my @​SourceFilesWithSpaceAtNameEndList = glob("* .h"\,"* .c");

Is that a self-shedding bicycle or what?

p5pRT commented 13 years ago

From @leonerd

On Thu\, Aug 04\, 2011 at 02​:08​:50PM +0000\, Ed Avis wrote​:

It's analogous to the situation with two-argument open. The only clean and safe way would be to avoid string parsing and use multiple arguments instead​:

@​g = safe_glob literal($dirname)\, dir_sep()\, wildcard('*');

I am in mind of DBI's placeholders. We can't use ? obviously\, so arbitrarly\, %p. as "path"

  @​g = safe_glob "%p/*"\, $dirname;

Where %p gets expanded sprintf-style\, by quoting any metacharacters in its argument.

-- Paul "LeoNerd" Evans

leonerd@​leonerd.org.uk ICQ# 4135350 | Registered Linux# 179460 http​://www.leonerd.org.uk/

p5pRT commented 13 years ago

From sidhekin@allverden.no

On Thu\, Aug 4\, 2011 at 5​:13 PM\, Paul LeoNerd Evans \leonerd@&#8203;leonerd\.org\.ukwrote​:

On Thu\, Aug 04\, 2011 at 02​:08​:50PM +0000\, Ed Avis wrote​:

It's analogous to the situation with two-argument open. The only clean and safe way would be to avoid string parsing and use multiple arguments instead​:

@​g = safe_glob literal($dirname)\, dir_sep()\, wildcard('*');

I am in mind of DBI's placeholders. We can't use ? obviously\, so arbitrarly\, %p. as "path"

@​g = safe_glob "%p/*"\, $dirname;

Where %p gets expanded sprintf-style\, by quoting any metacharacters in its argument.

  If you want a literal path and matching on the basename\, why glob?

  To me\, readdir and good old-fashioned pattern matching seem an obvious WTDI​:

  @​g = do { opendir my $h\, $dirname; map{ /^\./ ? () : "$dirname/$_" } readdir $h };

Eirik

p5pRT commented 13 years ago

From @epa

Paul LeoNerd Evans \<leonerd \ leonerd.org.uk> writes​:

I am in mind of DBI's placeholders. We can't use ? obviously\, so arbitrarly\, %p. as "path"

@​g = safe_glob "%p/*"\, $dirname;

I was about to suggest the same thing. There could be three placeholders​:

  %e a single path element\, that is\, cannot contain '/'   %r a path relative to the current directory and without '..'   %p an arbitrary path\, including absolute paths or ..

The same scheme could be used in a File​::Spec type interface for constructing paths.

  my $directory = shift @​ARGV;   my $file = safe_path 'templates/%r/info.txt'\, $directory;

That would be a safer alternative to string interpolation\, which can open up directory traversal vulnerabilities. (In an unprivileged environment\, handling absolute paths or paths with .. in them is a feature not a bug\, whereas giving the wrong result for filenames containing space is always a bug. So interpolating strings to make filenames is not as unsafe as interpolating them to make glob patterns. But it would still be good to have something guaranteed safe for CGI scripts and the like.)

I think that two-arg open() and glob() are the only two places in perl where a string argument contains a mixture of filename and magic metacharacters. Are there others?

-- Ed Avis \eda@&#8203;waniasset\.com

p5pRT commented 13 years ago

From jpl@research.att.com

On 08/04/11 11​:01\, David Nicol wrote​:

On Wed\, Aug 3\, 2011 at 1​:07 PM\, John P. Linderman (jpl) \jpl@&#8203;research\.att\.com wrote​:

On 08/03/11 13​:59\, The Sidhekin wrote​:

On Wed\, Aug 3\, 2011 at 7​:47 PM\, John P. Linderman (jpl) \jpl@&#8203;research\.att\.com wrote​: By which you mean "space is not a delimiter". How about telling what the delimiter is ("​:null_delim")\, if any ("​:no_delim")? ... it's the latter you've implemented\, right? Or "​:single_pattern". We are now down to the color of the bike shed\, as far as I am concerned :-) Nah\, still engineering the roof and walls.

How about a new LNV (Line Noise Variable) (with a slot in the hints table saying what it defaults to) identifying what the delimiter is. A LNV is not needed; something more verbose...

my @&#8203;SourceFilesWithSpaceAtNameEndList =  do \{ local $Glob&#8203;::delim =

"#'; glob("* .h#* .c")};

perldoc File​::Glob says

          Since v5\.6\.0\, Perl's CORE&#8203;::glob\(\) is implemented in terms of
    bsd\_glob\(\)\.  Note that they don't share the same
    prototype\-\-CORE&#8203;::glob\(\) only accepts a single argument\.  Due to
    historical reasons\, CORE&#8203;::glob\(\) will also split its argument on
    whitespace\, treating it as multiple patterns\, whereas bsd\_glob\(\)
    considers them as one pattern\.

How about\, when CORE​::glob gets more than one argument\, it doesn't split them?

my @&#8203;SourceFilesWithSpaceAtNameEndList =  glob\("\* \.h"\,"\* \.c"\);

Is that a self-shedding bicycle or what? Rather than providing people with new ways to pack and unpack multiple patterns in a single string (which was at the root of the problem of special-casing spaces)\, it would be easier to allow the first argument to be an array ref\, in which case it would be expanded into a bunch of individual patterns\, much as is currently done in Glob.pm when splitting on spaces. This would be a minor change to Glob.pm\, and would provide yet another way to "escape blanks in a pattern" (just put it in an array and pass a reference).

It's becoming clear to me that the Glob.pm documentation needs attention. There ought to be a major warning about the side-effects of '​:glob' rather than making it a poster-boy in the synopsis. If you dig around\, you find that

  /# NOTE​: The glob() export is only here for compatibility with 5.6.0.   # csh_glob() should not be used directly\, unless you know what   you're doing.   /

I still haven't figured out what it causes\, but if I step through a call under its sway\, I see the second argument to glob (it already takes an optional second argument) discarded\, and bsd_glob invoked. '​:glob' needs to disappear from the synopsis.

See you in a week and a half -- jpl

p5pRT commented 13 years ago

From @davidnicol

On Thu\, Aug 4\, 2011 at 1​:28 PM\, John P. Linderman (jpl) \jpl@&#8203;research\.att\.com wrote​:

On 08/04/11 11​:01\, David Nicol wrote​:

How about\, when CORE​::glob gets more than one argument\, it doesn't split them?

my @​SourceFilesWithSpaceAtNameEndList = glob("* .h"\,"* .c");

Rather than providing people with new ways to pack and unpack multiple patterns in a single string (which was at the root of the problem of special-casing spaces)\, it would be easier to allow the first argument to be an array ref\, in which case it would be expanded into a bunch of individual patterns\, much as is currently done in Glob.pm when splitting on spaces. This would be a minor change to Glob.pm\, and would provide yet another way to "escape blanks in a pattern" (just put it in an array and pass a reference).

I retract my earlier proposal in favor of this one. The 11​:01 proposal is flawed\, in that it would not be able to handle a single term with spaces in it (without introduction of an impossible dummy filename.) The arrayrefs-allowed proposal would allow a single term with spaces to be given to glob like so​:

  my @​EvenMore = glob(['even more.*']);

-- "The tools expect that they have full\, unlimited control of the hardware."  -- Intel Corporation

p5pRT commented 13 years ago

From jpl@research.att.com

On 08/04/11 19​:12\, David Nicol wrote​:

On Thu\, Aug 4\, 2011 at 1​:28 PM\, John P. Linderman (jpl) \jpl@&#8203;research\.att\.com wrote​:

On 08/04/11 11​:01\, David Nicol wrote​:

How about\, when CORE​::glob gets more than one argument\, it doesn't split them? my @​SourceFilesWithSpaceAtNameEndList = glob("* .h"\,"* .c"); Rather than providing people with new ways to pack and unpack multiple patterns in a single string (which was at the root of the problem of special-casing spaces)\, it would be easier to allow the first argument to be an array ref\, in which case it would be expanded into a bunch of individual patterns\, much as is currently done in Glob.pm when splitting on spaces. This would be a minor change to Glob.pm\, and would provide yet another way to "escape blanks in a pattern" (just put it in an array and pass a reference). I retract my earlier proposal in favor of this one. The 11​:01 proposal is flawed\, in that it would not be able to handle a single term with spaces in it (without introduction of an impossible dummy filename.) The arrayrefs-allowed proposal would allow a single term with spaces to be given to glob like so​:

my @​EvenMore = glob(['even more.*']);

One of the things I really enjoy about the porters is that they are more dedicated to improving Perl than they are to pushing an agenda. My earlier proposal was clearly flawed\, and David's proposal prompted me to think about about a way to make it better\, and I hope there will be further suggestions that make improvements over both. We sometimes adopt a suggestion because it is a clear improvement over what came before\, without wondering if we could do even better. But then we may be handcuffed about breaking existing code. There's no rush here\, it's been around for many years. Let's kick it around a while before we decide it's as good as we can make it. Hasta la vista. -- jpl

p5pRT commented 13 years ago

From @cpansprout

On Thu Aug 04 03​:16​:05 2011\, jpl@​research.att.com wrote​:

Ed Avis \eda@&#8203;waniasset\.com noted​:

Unfortunately\, that's a "feature"\, not a "bug"\.  Spaces delimit

separate

patterns\, and there must be a ton of scripts that rely on that behavior ("*.c *.h") There must be lots of programs that rely on that. But are they not outnumbered by the programs which rely on glob working for general filenames\, e.g.

     $wildcard = shift @&#8203;ARGV;
     @&#8203;files = glob\($wildcard\);

OK\, so that code is broken and has always been broken\.  But it's

not uncommon\, as a quick search turns up​:

\<http​://www.google.com/codesearch#search/&q=glob\%28\$%20lang​:^perl$&type=cs>

It's all rather unfortunate\, since the '\*\.c \*\.h' behaviour can

easily be had by doing (glob('*.c')\, glob('*.h')) instead\, and was always more of an accidental side-effect of calling out to tcsh for the globbing.

One reason this bug has gone (relatively) unnoticed for so long is that it doesn't happen in list context. So your example and many of the examples you cite will work just fine. It's scalar context that triggers the problem. There's some magic that I don't yet fully understand (and won't until I get back from vacation) that hands glob a second argument that can be used to identify which invocation is in play\, so state can be maintained. It's there (using -MO=Deparse) when '​:glob' is used

use File&#8203;::Glob \('&#8203;:glob'\);
while \(defined\($\_ = glob\("$dir/\*"\, 0\)\)\) \{
     last if \+\+$entries > $max;
\}
while \(defined\($\_ = glob\("$dir/\*"\, 1\)\)\) \{
     last if \+\+$entries > $max;
\}

but not when it is missing

use File&#8203;::Glob \('&#8203;:globally'\);
while \(defined\($\_ = glob\("$dir/\*"\)\)\) \{
     last if \+\+$entries > $max;
\}
while \(defined\($\_ = glob\("$dir/\*"\)\)\) \{
     last if \+\+$entries > $max;
\}

That seems totally backwards; it seems like the extra argument is essential for maintaining state to make scalar context act like an iterator\, but it's '​:glob' that is broken and everything else works.

File​::Glob​::glob explicitly ignores its second argument​:

sub glob {   splice @​_\, 1; # don't pass PL_glob_index as flags!   goto &bsd_glob; }

What you are seeing above is a B​::Deparse bug\, fixed in 5.14.0.

My diffs point how to make things work with embedded space\, but I'm not satisfied\, because I don't really understand what's happening. Anyone should feel free to fix things while I'm vacationing​:-) . -- jpl

p5pRT commented 13 years ago

From @cpansprout

On Thu Aug 04 07​:09​:35 2011\, eda@​waniasset.com wrote​:

I guess that although glob("$dirname/*") breaks for directory names containing whitespace\, it also breaks for those that contain the * or { or } characters\, which are equally possible at least on Unix.

The worst breakage is that a glob pattern with spaces gets backslash escapes processed three times on Unix\, so use this if you want to find all files beginning with \ or . :

@​files = \<\\\\\\\\* .*>;

Fortunately\, I’ve fixed that\, so you only need four backslashes now.

How I wish \<> had been implemented originally like a regexp literal!

Perhaps there is a case for making glob taint-check its argument under -T? After all\, the only truly safe way to use it is something like

my $dirname = shift @&#8203;ARGV;
die "directory name '$dirname' contains bad chars\, cannot glob"
   if $dirname =~ tr/\*\{\} //;
my @&#8203;g = glob "$dirname/\*";

But who does that? What tutorial would teach it as a safe habit? Or should it be

my $dirname = shift @&#8203;ARGV;
my $quoted = quotemeta $dirname;
my @&#8203;g = glob "$quoted/\*";

That seems dirty\, and won't work under Windows\, where glob() takes backslash as a directory separator.

Slash also works. From ext/File-Glob/bsd_glob.c​:

#define BG_SEP '/' #ifdef DOSISH #define BG_SEP2 '\\' #endif

p5pRT commented 13 years ago

From @cpansprout

On Thu Aug 04 08​:02​:11 2011\, davidnicol@​gmail.com wrote​:

On Wed\, Aug 3\, 2011 at 1​:07 PM\, John P. Linderman (jpl) \jpl@&#8203;research\.att\.com wrote​:

On 08/03/11 13​:59\, The Sidhekin wrote​:

On Wed\, Aug 3\, 2011 at 7​:47 PM\, John P. Linderman (jpl) \jpl@&#8203;research\.att\.com wrote​:

� By which you mean "space is not a delimiter".� How about telling what the delimiter is ("​:null_delim")\, if any ("​:no_delim")?� ... it's the latter you've implemented\, right? Or "​:single_pattern".� We are now down to the color of the bike shed\, as far as I am concerned :-)

Nah\, still engineering the roof and walls.

How about a new LNV (Line Noise Variable) (with a slot in the hints table saying what it defaults to) identifying what the delimiter is. A LNV is not needed; something more verbose...

my @​SourceFilesWithSpaceAtNameEndList = do { local $Glob​::delim = "#'; glob("* .h#* .c")};

perldoc File​::Glob says

         Since v5\.6\.0\, Perl's CORE&#8203;::glob\(\) is implemented in terms

of bsd_glob(). Note that they don't share the same prototype--CORE​::glob() only accepts a single argument. Due to historical reasons\, CORE​::glob() will also split its argument on whitespace\, treating it as multiple patterns\, whereas bsd_glob() considers them as one pattern.

How about\, when CORE​::glob gets more than one argument\, it doesn't split them?

my @​SourceFilesWithSpaceAtNameEndList = glob("* .h"\,"* .c");

Is that a self-shedding bicycle or what?

Then glob(@​list_of_patterns) does the unexpected when there is only one.

p5pRT commented 13 years ago

From @epa

Father Chrysostomos via RT \<perlbug-followup \ perl.org> writes​:

The worst breakage is that a glob pattern with spaces gets backslash escapes processed three times on Unix\, so use this if you want to find all files beginning with \ or . :

@​files = \<\\\\\\\\* .*>;

Fortunately\, I’ve fixed that\, so you only need four backslashes now.

That's great news! \

I do wonder why this particular bug is considered a bug to be fixed\, while the breakage with space characters is considered a feature to be preserved. I know there is some existing code with \<*.c *.h> but that is not difficult to fix - so under 'use 5.16' glob could have sensible semantics.

my $dirname = shift @​ARGV; my $quoted = quotemeta $dirname; my @​g = glob "$quoted/*";

That seems dirty\, and won't work under Windows\, where glob() takes backslash as a directory separator.

Slash also works.

Yup\, but since backslash works too\, the business of backslash-escaping glob patterns is even messier on Windows than on Unix. (at least for common patterns such as */* which might equally be given as *\* on Windows)

-- Ed Avis \eda@&#8203;waniasset\.com

p5pRT commented 13 years ago

From jpl@research.att.com

On 10/26/11 01​:57\, Father Chrysostomos via RT wrote​:

On Thu Aug 04 08​:02​:11 2011\, davidnicol@​gmail.com wrote​:

On Wed\, Aug 3\, 2011 at 1​:07 PM\, John P. Linderman (jpl) \jpl@&#8203;research\.att\.com wrote​:

On 08/03/11 13​:59\, The Sidhekin wrote​:

On Wed\, Aug 3\, 2011 at 7​:47 PM\, John P. Linderman (jpl) \jpl@&#8203;research\.att\.com wrote​: � By which you mean "space is not a delimiter".� How about telling what the delimiter is ("​:null_delim")\, if any ("​:no_delim")?� ... it's the latter you've implemented\, right? Or "​:single_pattern".� We are now down to the color of the bike shed\, as far as I am concerned :-) Nah\, still engineering the roof and walls.

How about a new LNV (Line Noise Variable) (with a slot in the hints table saying what it defaults to) identifying what the delimiter is. A LNV is not needed; something more verbose...

my @&#8203;SourceFilesWithSpaceAtNameEndList =  do \{ local $Glob&#8203;::delim =

"#'; glob("* .h#* .c")};

perldoc File​::Glob says

          Since v5\.6\.0\, Perl's CORE&#8203;::glob\(\) is implemented in terms

of bsd_glob(). Note that they don't share the same prototype--CORE​::glob() only accepts a single argument. Due to historical reasons\, CORE​::glob() will also split its argument on whitespace\, treating it as multiple patterns\, whereas bsd_glob() considers them as one pattern.

How about\, when CORE​::glob gets more than one argument\, it doesn't split them?

my @&#8203;SourceFilesWithSpaceAtNameEndList =  glob\("\* \.h"\,"\* \.c"\);

Is that a self-shedding bicycle or what?

Then glob(@​list_of_patterns) does the unexpected when there is only one.

There are (at least​:-) ) two problems here. One is that the synopsis says

  SYNOPSIS   use File​::Glob '​:glob';

But tag "​:glob" includes tag 'glob'\, which can lead to infinite looping behavior when used in scalar context\, like

while (\<*.c>) { ... }

That makes tag 'glob' very dangerous\, so it should not be invoked by default. It is possible\, though unlikely\, that someone is depending on

  $lastfile = \<*.c>;

to return the last matching file name (which it does)\, so we cannot simply drop the 'glob' tag from '​:glob'. We need a new\, safe tag ('​:safeglob'??) to advertise in the synopsis.

The other problem is getting glob to recognize blanks in file names\, which is what the OP was trying to do with

  use File​::Glob '​:glob';

in the first place\, since\, by default\, blanks separate distinct patterns. It's a one or two line change to Glob.pm to have it recognize an array reference\, and turn it into an array of patterns. This works just fine and cannot break any plausible existing code. Unfortunately\, it only works when glob is invoked directly\, not via angle-brackets.

  while (my $file = glob(['*.c'\, '*.h'\, 'em bedded*.txt'])); # works   while (my $file = \<['*.c'\, '*.h'\, 'em bedded*.txt']>); # doesn't

I lack the smarts to fix this\, and having glob() work but \<> not work is not acceptable. I was hoping some wiser porter might be able to get \<> to work. Then the new synopsis and array of patterns stuff could be turned loose. -- jpl

p5pRT commented 13 years ago

From @cpansprout

On Wed Oct 26 03​:00​:32 2011\, eda@​waniasset.com wrote​:

Father Chrysostomos via RT \<perlbug-followup \ perl.org> writes​:

The worst breakage is that a glob pattern with spaces gets backslash escapes processed three times on Unix\, so use this if you want to find all files beginning with \ or . :

@​files = \<\\\\\\\\* .*>;

Fortunately\, I’ve fixed that\, so you only need four backslashes now.

That's great news! \

I do wonder why this particular bug is considered a bug to be fixed\,

Because it wasn’t self-consistent. These two patterns produced the same results​:

  \<\\\\\\\\* .*> \<{\\\\\,.}*>

while the breakage with space characters is considered a feature to be preserved. I know there is some existing code with \<*.c *.h>

A *lot* of existing code.

but that is not difficult to fix - so under 'use 5.16' glob could have sensible semantics.

my $dirname = shift @​ARGV; my $quoted = quotemeta $dirname; my @​g = glob "$quoted/*";

That seems dirty\, and won't work under Windows\, where glob() takes backslash as a directory separator.

Slash also works.

Yup\, but since backslash works too\, the business of backslash-escaping glob patterns is even messier on Windows than on Unix. (at least for common patterns such as */* which might equally be given as *\* on Windows)

p5pRT commented 13 years ago

From @cpansprout

On Wed Oct 26 05​:20​:00 2011\, jpl@​research.att.com wrote​:

There are (at least​:-) ) two problems here. One is that the synopsis says

SYNOPSIS
          use File&#8203;::Glob '&#8203;:glob';

But tag "​:glob" includes tag 'glob'\, which can lead to infinite looping behavior when used in scalar context\, like

while (\<*.c>) { ... }

That makes tag 'glob' very dangerous\, so it should not be invoked by default. It is possible\, though unlikely\, that someone is depending on

$lastfile = \<\*\.c>;

to return the last matching file name (which it does)\, so we cannot simply drop the 'glob' tag from '​:glob'. We need a new\, safe tag ('​:safeglob'??) to advertise in the synopsis.

The other problem is getting glob to recognize blanks in file names\, which is what the OP was trying to do with

use File&#8203;::Glob '&#8203;:glob';

in the first place\, since\, by default\, blanks separate distinct patterns. It's a one or two line change to Glob.pm to have it recognize an array reference\, and turn it into an array of patterns. This works just fine and cannot break any plausible existing code. Unfortunately\, it only works when glob is invoked directly\, not via angle-brackets.

while \(my $file = glob\(\['\*\.c'\, '\*\.h'\, 'em bedded\*\.txt'\]\)\);   \# works
while \(my $file = \<\['\*\.c'\, '\*\.h'\, 'em bedded\*\.txt'\]>\);   \# doesn't

I lack the smarts to fix this\, and having glob() work but \<> not work is not acceptable.

I think that is acceptable\, as \<> is a quote-like operator. See my response at \<http​://www.nntp.perl.org/group/perl.perl5.porters/2011/10/msg178117.html>.

I’ve already started working through the pile of bugs I described there.

But as for glob([...])\, I think a pragma (or import option) and a plain list would be better\, precisely to avoid the problem of how to deal with overloaded array references\, etc. (Almost every piece of code that does different things depending on the type of the operand has unfixable bugs in it.)

Since people already know that bsd_glob() does not split on spaces\, we could have ‘use File​::Glob '​:bsd_glob'’ export a special glob() function that maintains state but calls bsd_glob underneath.

I can think of several other interfaces\, but this is the one I prefer. The problem with a pragma is that :case and :nocase are global\, not lexical. But a :split/​:nosplit option would have to be lexical to avoid breakage. Then confusion ensues.

p5pRT commented 13 years ago

From tchrist@perl.com

Given​:

  % mkdir /tmp/sandbox   % cd /tmp/sandbox   % touch "space file" space file

  % perl -E 'say for \<*e f*>'   file   space   space file   file

  % perl -E 'say for \<"*e f*">'   space file

  % perl -E '$v = "e"; say for \<"*{$e} f*">'   space file

No code needs changing\, only doc. I thus propose​:

  The C\ function grandfathers the use of whitespace to separate   multiple patterns such as C\<\< \<*.c *.h> >>. If you want to glob   filenames that might contain whitespace\, you'll have to use extra   quotes around the spacey filename to protect it. For example\, to   glob filenames that have an “C\” followed by a space followed by   an “C\”\, use either of​:

  @​spacies = \<"*e f*">;   @​spacies = glob '"*e f*"';   @​spacies = glob q("*e f*");

  If you had to get a variable through\, you could do this​:

  @​spacies = glob "'*${var}e f*'";   @​spacies = glob qq("*${var}e f*");

  Alternately\, you can use the C\<File​::Glob> module directly\, so   for details\, see its manpage. Calling C\ or the C\<\< \<*> >>   operator automatically C\s that module\, so if the module   mysteriously vaporizes from your library\, an exception is raised.

--tom

p5pRT commented 13 years ago

From @cpansprout

On Wed Oct 26 08​:55​:56 2011\, tom christiansen wrote​:

Given​:

% mkdir /tmp/sandbox
% cd /tmp/sandbox
% touch "space file" space file

% perl \-E 'say for \<\*e f\*>'
file
space
space file
file

% perl \-E 'say for \<"\*e f\*">'
space file

% perl \-E '$v = "e"; say for \<"\*\{$e\} f\*">'
space file

No code needs changing\, only doc.

I think both need changing. See \<http​://www.nntp.perl.org/group/perl.perl5.porters/2011/10/msg178117.html>.

Also\, File​::Glob​::glob is completely broken and should be deprecated and replaced with something that works. If we make glob() truly overridable\, why not allow multiple arguments?

Also\, I would like \<...> to be more like a regexp literal under ‘use v5.16’. On Unix\, I should be able to write​:

  \<*\**>

to find files containing ‘*’. That currently is equivalent to \<*>.

I thus propose​:

The C\<glob> function grandfathers the use of whitespace to separate
multiple patterns such as C\<\< \<\*\.c \*\.h> >>\.  If you want to glob
filenames that might contain whitespace\, you'll have to use extra
quotes around the spacey filename to protect it\.  For example\, to
glob filenames that have an “C\<e>” followed by a space followed by
an “C\<f>”\, use either of&#8203;:

    @&#8203;spacies = \<"\*e f\*">;
    @&#8203;spacies = glob  '"\*e f\*"';
    @&#8203;spacies = glob q\("\*e f\*"\);

If you had to get a variable through\, you could do this&#8203;:

    @&#8203;spacies = glob   "'\*$\{var\}e f\*'";
    @&#8203;spacies = glob qq\("\*$\{var\}e f\*"\);

Alternately\, you can use the C\<File&#8203;::Glob> module directly\, so 
for details\, see its manpage\.  Calling C\<glob> or the C\<\< \<\*> >>
operator automatically C\<use>s that module\, so if the module
mysteriously vaporizes from your library\, an exception is raised\.

--tom

p5pRT commented 13 years ago

From jpl@research.att.com

On 10/26/11 11​:46\, Father Chrysostomos via RT wrote​:

On Wed Oct 26 05​:20​:00 2011\, jpl@​research.att.com wrote​:

There are (at least​:-) ) two problems here. One is that the synopsis says

 SYNOPSIS
           use File&#8203;::Glob '&#8203;:glob';

But tag "​:glob" includes tag 'glob'\, which can lead to infinite looping behavior when used in scalar context\, like

while (\<*.c>) { ... }

That makes tag 'glob' very dangerous\, so it should not be invoked by default. It is possible\, though unlikely\, that someone is depending on

 $lastfile =\<\*\.c>;

to return the last matching file name (which it does)\, so we cannot simply drop the 'glob' tag from '​:glob'. We need a new\, safe tag ('​:safeglob'??) to advertise in the synopsis.

The other problem is getting glob to recognize blanks in file names\, which is what the OP was trying to do with

 use File&#8203;::Glob '&#8203;:glob';

in the first place\, since\, by default\, blanks separate distinct patterns. It's a one or two line change to Glob.pm to have it recognize an array reference\, and turn it into an array of patterns. This works just fine and cannot break any plausible existing code. Unfortunately\, it only works when glob is invoked directly\, not via angle-brackets.

 while \(my $file = glob\(\['\*\.c'\, '\*\.h'\, 'em bedded\*\.txt'\]\)\);   \# works
 while \(my $file =\<\['\*\.c'\, '\*\.h'\, 'em bedded\*\.txt'\]>\);   \# doesn't

I lack the smarts to fix this\, and having glob() work but\<> not work is not acceptable. I think that is acceptable\, as\<> is a quote-like operator. See my response at \<http​://www.nntp.perl.org/group/perl.perl5.porters/2011/10/msg178117.html>.

Somehow\, I never saw that. I'm in (pretty much) total agreement. I’ve already started working through the pile of bugs I described there.

But as for glob([...])\, I think a pragma (or import option) and a plain list would be better\, precisely to avoid the problem of how to deal with overloaded array references\, etc. (Almost every piece of code that does different things depending on the type of the operand has unfixable bugs in it.)

Since people already know that bsd_glob() does not split on spaces\, we could have ‘use File​::Glob '​:bsd_glob'’ export a special glob() function that maintains state but calls bsd_glob underneath.

This sounds great! My primary concern has been that the synopsis for File​::Glob encourages something that can all-too-easily lead to infinite looping (and there's not even a warning in the documentation). If '​:bsd_glob' handles both "while magic" and multiple\, unsplit\, patterns\, then putting it in the synopsis addresses all my concerns (and gets me out of the loop\, which is also useful). It might be nice to have (documentation) warning about the use of '​:glob'\, as well. -- jpl I can think of several other interfaces\, but this is the one I prefer. The problem with a pragma is that :case and :nocase are global\, not lexical. But a :split/​:nosplit option would have to be lexical to avoid breakage. Then confusion ensues.

p5pRT commented 13 years ago

From @cpansprout

Although this is a discussion about glob in particular\, I’d like to make all keywords overridable under ‘use v5.16’. See the last paragraph below.

On Wed Oct 26 09​:40​:56 2011\, jpl@​research.att.com wrote​:

On 10/26/11 11​:46\, Father Chrysostomos via RT wrote​:

Since people already know that bsd_glob() does not split on spaces\, we could have ‘use File​::Glob '​:bsd_glob'’ export a special glob() function that maintains state but calls bsd_glob underneath.

This sounds great! My primary concern has been that the synopsis for File​::Glob encourages something that can all-too-easily lead to infinite looping (and there's not even a warning in the documentation). If '​:bsd_glob' handles both "while magic" and multiple\, unsplit\, patterns\, then putting it in the synopsis addresses all my concerns (and gets me out of the loop\, which is also useful).

This is complicated\, not in terms of actually writing the code\, but in thinking about the edge cases and deciding how they should work.

If we make glob truly overridable under ‘use v5.16’\, then we have the problem of modules like List​::Gen exporting a glob() function that expects to get the magic second argument to indicate the call site.

We could continue to have that work for \<...>\, but then how does List​::Gen​::glob tell the difference between \<*> and glob('*'\,42)? One solution to that is to continue to treat glob overrides specially under ‘use v5.16’ as long as the subroutine itself was *not* declared under ‘use v5.16’.

Also\, the while(glob...) magic (implicit defined()) disappears if all we have is a subroutine call (the way most overrides work). But that I consider a reasonable trade-off\, as it’s possible to implement the same thing in XS. In fact\, it’s possible to write an XS module to apply that while-defined magic to *any* subroutine\, not just those named ‘glob’.

If we make all keywords overridable\, then we also have the problem of ‘require bareword’ magic disappearing. But that I also consider a reasonable trade-off. We need to get away from special-casing particular keywords. It’s possible to write XS modules to apply these special cases more generally (special-case the subroutines themselves\, not their names).

p5pRT commented 13 years ago

From @cpansprout

On Thu Oct 27 12​:58​:36 2011\, sprout wrote​:

We could continue to have that work for \<...>\, but then how does List​::Gen​::glob tell the difference between \<*> and glob('*'\,42)? One solution to that is to continue to treat glob overrides specially under ‘use v5.16’ as long as the subroutine itself was *not* declared under ‘use v5.16’.

XS modules are not really declared in any Perl scope\, but XS glob overrides are rare enough I don’t think we have to worry about that.

p5pRT commented 13 years ago

From @cpansprout

On Thu Oct 27 13​:00​:50 2011\, sprout wrote​:

On Thu Oct 27 12​:58​:36 2011\, sprout wrote​:

We could continue to have that work for \<...>\, but then how does List​::Gen​::glob tell the difference between \<*> and glob('*'\,42)? One solution to that is to continue to treat glob overrides specially under ‘use v5.16’ as long as the subroutine itself was *not* declared under ‘use v5.16’.

XS modules are not really declared in any Perl scope\, but XS glob overrides are rare enough I don’t think we have to worry about that.

This also affects subroutines declared under ‘use v5.16’. How are *they* to distinguish between \<*.c> and glob('*.c'\, '*.a')? (looks_like_number($_[1]) is too fragile.) Since glob() would have to use some other means than the second argument (e.g.\, Devel​::CallSite) to determine the caller\, maybe that could apply to \<...> as well\, when it calls a glob override. I.e.\, it shouldn’t pass the magic second argument.

This is much more of a mess than I’d hoped....

(I hope somebody is reading this and has some ideas.)

p5pRT commented 13 years ago

From jpl@research.att.com

On 10/27/11 17​:43\, Father Chrysostomos via RT wrote​:

On Thu Oct 27 13​:00​:50 2011\, sprout wrote​:

On Thu Oct 27 12​:58​:36 2011\, sprout wrote​:

We could continue to have that work for\<...>\, but then how does List​::Gen​::glob tell the difference between\<*> and glob('*'\,42)? One solution to that is to continue to treat glob overrides specially under ‘use v5.16’ as long as the subroutine itself was *not* declared under ‘use v5.16’. XS modules are not really declared in any Perl scope\, but XS glob overrides are rare enough I don’t think we have to worry about that. This also affects subroutines declared under ‘use v5.16’. How are *they* to distinguish between\<*.c> and glob('*.c'\, '*.a')? (looks_like_number($_[1]) is too fragile.) Since glob() would have to use some other means than the second argument (e.g.\, Devel​::CallSite) to determine the caller\, maybe that could apply to\<...> as well\, when it calls a glob override. I.e.\, it shouldn’t pass the magic second argument.

This is much more of a mess than I’d hoped....

(I hope somebody is reading this and has some ideas.)

The best should not be the enemy of the good. At minimum\, there should be some documentation changes to File​::Glob\, removing the '​:glob' from the synopsis\, and warning about its use. I'm not sufficiently knowledgeable about keyword overrides to offer any other ideas. (But at least I'm reading this​:-) ) -- jpl

p5pRT commented 13 years ago

From @cpansprout

On Fri Oct 28 07​:40​:43 2011\, jpl@​research.att.com wrote​:

On 10/27/11 17​:43\, Father Chrysostomos via RT wrote​:

On Thu Oct 27 13​:00​:50 2011\, sprout wrote​:

On Thu Oct 27 12​:58​:36 2011\, sprout wrote​:

We could continue to have that work for\<...>\, but then how does List​::Gen​::glob tell the difference between\<*> and glob('*'\,42)? One solution to that is to continue to treat glob overrides specially under ‘use v5.16’ as long as the subroutine itself was *not* declared under ‘use v5.16’. XS modules are not really declared in any Perl scope\, but XS glob overrides are rare enough I don’t think we have to worry about that. This also affects subroutines declared under ‘use v5.16’. How are *they* to distinguish between\<*.c> and glob('*.c'\, '*.a')? (looks_like_number($_[1]) is too fragile.) Since glob() would have to use some other means than the second argument (e.g.\, Devel​::CallSite) to determine the caller\, maybe that could apply to\<...> as well\, when it calls a glob override. I.e.\, it shouldn’t pass the magic second argument.

This is much more of a mess than I’d hoped....

(I hope somebody is reading this and has some ideas.)

The best should not be the enemy of the good. At minimum\, there should be some documentation changes to File​::Glob\, removing the '​:glob' from the synopsis\, and warning about its use.

We can also add the :bsd_glob export that I suggested\, but without support for lists at first--that can come later.

I'm not sufficiently knowledgeable about keyword overrides to offer any other ideas. (But at least I'm reading this​:-) ) -- jpl

I’m still thinking about it. I think that the glob() magic should depend solely on the scope in which the subroutine is defined\, and not on the caller\, as a sub defined under ‘use v5.16’ will not be expecting the magic second argument and should not have to check the caller’s feature hints to determine whether the second argument was passed in by the user or was a ‘perquisite’.

Now\, should the same thing apply to require? I.e.\, should subs compiled under ‘use v5.16’ get the magical bareword treatment (automatic s|​::|/|gr . ".pm")? I think they should.

--

Father Chrysostomos

p5pRT commented 13 years ago

From @cpansprout

On Fri Oct 28 18​:02​:14 2011\, sprout wrote​:

On Fri Oct 28 07​:40​:43 2011\, jpl@​research.att.com wrote​:

The best should not be the enemy of the good. At minimum\, there should be some documentation changes to File​::Glob\, removing the '​:glob' from the synopsis\, and warning about its use.

Done with commit 5144542d35.

We can also add the :bsd_glob export that I suggested\, but without support for lists at first--that can come later.

Done with commit f4cbf9907d.

I'm not sufficiently knowledgeable about keyword overrides to offer any other ideas. (But at least I'm reading this​:-) ) -- jpl

I’m still thinking about it. I think that the glob() magic should depend solely on the scope in which the subroutine is defined\, and not on the caller\, as a sub defined under ‘use v5.16’ will not be expecting the magic second argument and should not have to check the caller’s feature hints to determine whether the second argument was passed in by the user or was a ‘perquisite’.

Now\, should the same thing apply to require? I.e.\, should subs compiled under ‘use v5.16’ get the magical bareword treatment (automatic s|​::|/|gr . ".pm")? I think they should.

I searched CPAN and did not find any non-core modules that use the magic second argument to glob()\, so it sounds as though we don’t need any fancy magic. We could just make glob() fully overridable (under ‘use v5.16’)\, plain and simple.

However\, this is still just making my brain go round in circles\, so I will set it aside for now.

--

Father Chrysostomos

p5pRT commented 13 years ago

@cpansprout - Status changed from 'open' to 'resolved'

p5pRT commented 13 years ago

From @cpansprout

On Wed Oct 26 08​:55​:56 2011\, tom christiansen wrote​:

Given​:

% mkdir /tmp/sandbox
% cd /tmp/sandbox
% touch "space file" space file

% perl \-E 'say for \<\*e f\*>'
file
space
space file
file

% perl \-E 'say for \<"\*e f\*">'
space file

% perl \-E '$v = "e"; say for \<"\*\{$e\} f\*">'
space file

No code needs changing\, only doc. I thus propose​:

Some of that was already covered in perlfunc. But I’ve taken part of your text and added it with commit a91bb7b.

The C\<glob> function grandfathers the use of whitespace to separate
multiple patterns such as C\<\< \<\*\.c \*\.h> >>\.  If you want to glob
filenames that might contain whitespace\, you'll have to use extra
quotes around the spacey filename to protect it\.  For example\, to
glob filenames that have an “C\<e>” followed by a space followed by
an “C\<f>”\, use either of&#8203;:

    @&#8203;spacies = \<"\*e f\*">;
    @&#8203;spacies = glob  '"\*e f\*"';
    @&#8203;spacies = glob q\("\*e f\*"\);

If you had to get a variable through\, you could do this&#8203;:

    @&#8203;spacies = glob   "'\*$\{var\}e f\*'";
    @&#8203;spacies = glob qq\("\*$\{var\}e f\*"\);

Alternately\, you can use the C\<File&#8203;::Glob> module directly\, so 
for details\, see its manpage\.  Calling C\<glob> or the C\<\< \<\*> >>
operator automatically C\<use>s that module\, so if the module
mysteriously vaporizes from your library\, an exception is raised\.

--tom

--

Father Chrysostomos

p5pRT commented 13 years ago

From [Unknown Contact. See original ticket]

On Wed Oct 26 08​:55​:56 2011\, tom christiansen wrote​:

Given​:

% mkdir /tmp/sandbox
% cd /tmp/sandbox
% touch "space file" space file

% perl \-E 'say for \<\*e f\*>'
file
space
space file
file

% perl \-E 'say for \<"\*e f\*">'
space file

% perl \-E '$v = "e"; say for \<"\*\{$e\} f\*">'
space file

No code needs changing\, only doc. I thus propose​:

Some of that was already covered in perlfunc. But I’ve taken part of your text and added it with commit a91bb7b.

The C\<glob> function grandfathers the use of whitespace to separate
multiple patterns such as C\<\< \<\*\.c \*\.h> >>\.  If you want to glob
filenames that might contain whitespace\, you'll have to use extra
quotes around the spacey filename to protect it\.  For example\, to
glob filenames that have an “C\<e>” followed by a space followed by
an “C\<f>”\, use either of&#8203;:

    @&#8203;spacies = \<"\*e f\*">;
    @&#8203;spacies = glob  '"\*e f\*"';
    @&#8203;spacies = glob q\("\*e f\*"\);

If you had to get a variable through\, you could do this&#8203;:

    @&#8203;spacies = glob   "'\*$\{var\}e f\*'";
    @&#8203;spacies = glob qq\("\*$\{var\}e f\*"\);

Alternately\, you can use the C\<File&#8203;::Glob> module directly\, so 
for details\, see its manpage\.  Calling C\<glob> or the C\<\< \<\*> >>
operator automatically C\<use>s that module\, so if the module
mysteriously vaporizes from your library\, an exception is raised\.

--tom

--

Father Chrysostomos

p5pRT commented 13 years ago

From tchrist@perl.com

Some of that was already covered in perlfunc. But I’ve taken part of your text and added it with commit a91bb7b.

Good\, thanks.

--tom