Closed p5pRT closed 20 years ago
There appears to be a bug in filesys I/O (under Linux) which breaks up filenames with spaces in them.
My perl config (both Linux and HP-UX) is attached\, along with the malfunctioning prog (which seems to WORK JUST FINE under HP-UX). They are also as follows\, in case you don't like attachments:
Summary of my perl5 (5.0 patchlevel 5 subversion 3) configuration: Platform: osname=linux\, osvers=2.2.14-1mdksmp\, archname=i386-linux uname='linux jedi.mandrakesoft.com 2.2.14-1mdksmp #1 smp thu dec 2 01:02:03 cet 1999 i686 unknown ' hint=recommended\, useposix=true\, d_sigaction=define usethreads=undef useperlio=undef d_sfio=undef Compiler: cc='cc'\, optimize='-O3 -fomit-frame-pointer -fno-exceptions -fno-rtti -pipe -s -mpentium -mcpu=pentium -march=pentium -ffast-math -fexpensive-optimizations'\, gccversion=2.95.2 19991024 (release) cppflags='-Dbool=char -DHAS_BOOL -I/usr/local/include' ccflags ='-Dbool=char -DHAS_BOOL -I/usr/local/include' stdchar='char'\, d_stdstdio=undef\, usevfork=false intsize=4\, longsize=4\, ptrsize=4\, doublesize=8 d_longlong=define\, longlongsize=8\, d_longdbl=define\, longdblsize=12 alignbytes=4\, usemymalloc=n\, prototype=define Linker and Libraries: ld='cc'\, ldflags =' -L/usr/local/lib' libpth=/usr/local/lib /lib /usr/lib libs=-lnsl -ldl -lm -lc -lposix -lcrypt libc=\, so=so\, useshrplib=false\, libperl=libperl.a Dynamic Linking: dlsrc=dl_dlopen.xs\, dlext=so\, d_dlsymun=undef\, ccdlflags='-rdynamic' cccdlflags='-fpic'\, lddlflags='-shared -L/usr/local/lib'
Characteristics of this binary (from libperl): Built under linux Compiled at Dec 17 1999 17:16:45 @INC: /usr/lib/perl5/5.00503/i386-linux /usr/lib/perl5/5.00503 /usr/lib/perl5/site_perl/5.005/i386-linux /usr/lib/perl5/site_perl/5.005 .
Malfunc-ing program:
The behaviour I'm talking about can be obtained by running:
prompt$ echo fred > "foo bar baz" prompt$ perl frename.pl ".*baz" "baz" "barney"
which should result in the message "Error renaming baz: No such file or directory"
#! /usr/bin/perl #Takes a min of 3 perl regexps: 1) pattern to match 2) pattern to #change in all files matching pattern1 and 3) pattern to replace #pattern2 with. After the 3rd\, all patterns are repetitions of #pattern2 and pattern3. Also takes optional -s switch for stripping #whitespace.
sub chmp_whitespace { $cpyfn=shift @_; $file=""; @concat=split /\s+/\, $cpyfn; foreach $baz (\<@concat>) { chomp $baz; print "$baz\n"; $file="$file$baz"; print "file $file\n"; } return $file; }
($#ARGV>-1) or die "You must provide cmd line args!";
$regexp = shift @ARGV;
#if -s switch is present\, strip $fns of whitespace $strpspc=0; if($regexp eq "-s") { $strpspc=1; $regexp=shift @ARGV; }
foreach $fn (\<*>) { print "foreach $fn\n"; #Strip whitespace if($strpspc) { $file=chmp_whitespace $fn; } else { $file=$fn }
#proceed to replace regexps @tmp=@ARGV; while($foo=shift @tmp) { $bar=shift @tmp; if($file=~/$regexp/ && $file=~/$foo/) { $file=~s/$foo/$bar/g; rename("$fn"\, "$file") || print "Error renaming $fn: $!\n"; } } }
Perl config under HP-UX:
Summary of my perl5 (5.0 patchlevel 5 subversion 3) configuration: Platform: osname=hpux\, osvers=11.00\, archname=PA-RISC2.0 uname='hp-ux ss b.11.00 u 9000800 680309313 unlimited-user license ' hint=recommended\, useposix=true\, d_sigaction=define usethreads=undef useperlio=undef d_sfio=undef Compiler: cc='cc'\, optimize='-O'\, gccversion= cppflags='-D_HPUX_SOURCE -Aa -I/usr/local/include' ccflags ='-D_HPUX_SOURCE -Aa -I/usr/local/include' stdchar='unsigned char'\, d_stdstdio=define\, usevfork=false intsize=4\, longsize=4\, ptrsize=4\, doublesize=8 d_longlong=undef\, longlongsize=\, d_longdbl=define\, longdblsize=16 alignbytes=8\, usemymalloc=y\, prototype=define Linker and Libraries: ld='ld'\, ldflags =' -L/usr/local/lib' libpth=/usr/local/lib /lib/pa1.1 /lib /usr/lib /usr/ccs/lib libs=-lnsl -lnm -lndbm -ldld -lm -lc -lndir -lcrypt libc=/lib/libc.sl\, so=sl\, useshrplib=false\, libperl=libperl.a Dynamic Linking: dlsrc=dl_hpux.xs\, dlext=sl\, d_dlsymun=undef\, ccdlflags='-Wl\,-E -Wl\,-B\,deferred ' cccdlflags='+z'\, lddlflags='-b -L/usr/local/lib'
Characteristics of this binary (from libperl): Built under hpux Compiled at Aug 16 1999 18:05:55 @INC: /opt/perl5/lib/5.00503/PA-RISC2.0 /opt/perl5/lib/5.00503 /opt/perl5/lib/site_perl/5.005/PA-RISC2.0 /opt/perl5/lib/site_perl/5.005 .
Summary of my perl5 (5.0 patchlevel 5 subversion 3) configuration: Platform: osname=linux\, osvers=2.2.14-1mdksmp\, archname=i386-linux uname='linux jedi.mandrakesoft.com 2.2.14-1mdksmp #1 smp thu dec 2 01:02:03 cet 1999 i686 unknown ' hint=recommended\, useposix=true\, d_sigaction=define usethreads=undef useperlio=undef d_sfio=undef Compiler: cc='cc'\, optimize='-O3 -fomit-frame-pointer -fno-exceptions -fno-rtti -pipe -s -mpentium -mcpu=pentium -march=pentium -ffast-math -fexpensive-optimizations'\, gccversion=2.95.2 19991024 (release) cppflags='-Dbool=char -DHAS_BOOL -I/usr/local/include' ccflags ='-Dbool=char -DHAS_BOOL -I/usr/local/include' stdchar='char'\, d_stdstdio=undef\, usevfork=false intsize=4\, longsize=4\, ptrsize=4\, doublesize=8 d_longlong=define\, longlongsize=8\, d_longdbl=define\, longdblsize=12 alignbytes=4\, usemymalloc=n\, prototype=define Linker and Libraries: ld='cc'\, ldflags =' -L/usr/local/lib' libpth=/usr/local/lib /lib /usr/lib libs=-lnsl -ldl -lm -lc -lposix -lcrypt libc=\, so=so\, useshrplib=false\, libperl=libperl.a Dynamic Linking: dlsrc=dl_dlopen.xs\, dlext=so\, d_dlsymun=undef\, ccdlflags='-rdynamic' cccdlflags='-fpic'\, lddlflags='-shared -L/usr/local/lib'
Characteristics of this binary (from libperl): Built under linux Compiled at Dec 17 1999 17:16:45 @INC: /usr/lib/perl5/5.00503/i386-linux /usr/lib/perl5/5.00503 /usr/lib/perl5/site_perl/5.005/i386-linux /usr/lib/perl5/site_perl/5.005 .
Summary of my perl5 (5.0 patchlevel 5 subversion 3) configuration: Platform: osname=hpux\, osvers=11.00\, archname=PA-RISC2.0 uname='hp-ux ss b.11.00 u 9000800 680309313 unlimited-user license ' hint=recommended\, useposix=true\, d_sigaction=define usethreads=undef useperlio=undef d_sfio=undef Compiler: cc='cc'\, optimize='-O'\, gccversion= cppflags='-D_HPUX_SOURCE -Aa -I/usr/local/include' ccflags ='-D_HPUX_SOURCE -Aa -I/usr/local/include' stdchar='unsigned char'\, d_stdstdio=define\, usevfork=false intsize=4\, longsize=4\, ptrsize=4\, doublesize=8 d_longlong=undef\, longlongsize=\, d_longdbl=define\, longdblsize=16 alignbytes=8\, usemymalloc=y\, prototype=define Linker and Libraries: ld='ld'\, ldflags =' -L/usr/local/lib' libpth=/usr/local/lib /lib/pa1.1 /lib /usr/lib /usr/ccs/lib libs=-lnsl -lnm -lndbm -ldld -lm -lc -lndir -lcrypt libc=/lib/libc.sl\, so=sl\, useshrplib=false\, libperl=libperl.a Dynamic Linking: dlsrc=dl_hpux.xs\, dlext=sl\, d_dlsymun=undef\, ccdlflags='-Wl\,-E -Wl\,-B\,deferred ' cccdlflags='+z'\, lddlflags='-b -L/usr/local/lib'
Characteristics of this binary (from libperl): Built under hpux Compiled at Aug 16 1999 18:05:55 @INC: /opt/perl5/lib/5.00503/PA-RISC2.0 /opt/perl5/lib/5.00503 /opt/perl5/lib/site_perl/5.005/PA-RISC2.0 /opt/perl5/lib/site_perl/5.005 .
My perl config (both Linux and HP-UX) is attached\, along with the malfunctioning prog (which seems to WORK JUST FINE under HP-UX). They are also as follows\, in case you don't like attachments:
That's right. Here's the story.
If you just use the standard built-in glob\, then for compatibility with Perl of Old\, it will split on white space because it doesn't know that you aren't asking for more than one fileglob.
#!/bin/sh -x rm -rf /tmp/fred "/tmp/fred stuff" mkdir "/tmp/fred stuff" touch "/tmp/fred stuff/a" touch "/tmp/fred stuff/b" perl -We ' @a = \</tmp/fred stu*>; print "Normal built-in globbed @a\n" ';
That prints just "/tmp/fred"\, which isn't there. But that's what you have to do for backwards compat.
The work around is simple. Just use File::Glob ":glob" in that package\, or even use File::Glob ":globally" to redefine the world:
#!/bin/sh -x rm -rf /tmp/fred "/tmp/fred stuff" mkdir "/tmp/fred stuff" touch "/tmp/fred stuff/a" touch "/tmp/fred stuff/b" perl -We ' use File::Glob ":glob"; @a = \</tmp/fred stu*>; print "File Glob globbed @a\n" ';
Now it reports having globbed "/tmp/fred stuff"\, as you would like.
Note that when you pull in :glob\, or :globally for the whole program\, you get sh semantics from there on for \<*> and glob("*"). You need to add options\, or call csh_glob for the old stuff. But it still is white-space clean.
Here are other examples:
use File::Glob ':glob'; @list = glob('*.[ch]'); # won't do tildes here anymore1 $homedir = glob('~daemon'\, GLOB_TILDE | GLOB_ERR); if (GLOB_ERROR) { # an error occurred reading $homedir }
use File::Glob 'csh_glob'; $homedir = csh_glob("~root"); # now you can print "home is $homedir\n";
## override the core glob use File::Glob ':globally'; my @sources = \<*.{c\,h\,y}>;
## override the core glob\, forcing case sensitivity use File::Glob qw(:globally :case); my @sources = \<*.{c\,h\,y}>;
## override the core glob forcing case *in*sensitivity use File::Glob qw(:globally :nocase); my @sources = \<*.{c\,h\,y}>; # gets *.C also!!
And here's the full thing\, the default globbage in 5.6.
--tom
NAME File::Glob - Perl extension for BSD glob routine
SYNOPSIS use File::Glob ':glob'; @list = glob('*.[ch]'); $homedir = glob('~gnat'\, GLOB_TILDE | GLOB_ERR); if (GLOB_ERROR) { # an error occurred reading $homedir }
## override the core glob (even with -T) use File::Glob ':globally'; my @sources = \<*.{c\,h\,y}>
## override the core glob\, forcing case sensitivity use File::Glob qw(:globally :case); my @sources = \<*.{c\,h\,y}>
## override the core glob forcing case insensitivity use File::Glob qw(:globally :nocase); my @sources = \<*.{c\,h\,y}>
DESCRIPTION File::Glob implements the FreeBSD glob(3) routine\, which is a superset of the POSIX glob() (described in IEEE Std 1003.2 "POSIX.2"). The glob() routine takes a mandatory `pattern' argument\, and an optional `flags' argument\, and returns a list of filenames matching the pattern\, with interpretation of the pattern modified by the `flags' variable. The POSIX defined flags are:
`GLOB_ERR' Force glob() to return an error when it encounters a directory it cannot open or read. Ordinarily glob() continues to find matches.
`GLOB_MARK' Each pathname that is a directory that matches the pattern has a slash appended.
`GLOB_NOCASE' By default\, file names are assumed to be case sensitive; this flag makes glob() treat case differences as not significant.
`GLOB_NOCHECK' If the pattern does not match any pathname\, then glob() returns a list consisting of only the pattern. If `GLOB_QUOTE' is set\, its effect is present in the pattern returned.
`GLOB_NOSORT' By default\, the pathnames are sorted in ascending ASCII order; this flag prevents that sorting (speeding up glob()).
The FreeBSD extensions to the POSIX standard are the following flags:
`GLOB_BRACE' Pre-process the string to expand `{pat\,pat\,...}' strings like csh(1). The pattern '{}' is left unexpanded for historical reasons (and csh(1) does the same thing to ease typing of find(1) patterns).
`GLOB_NOMAGIC' Same as `GLOB_NOCHECK' but it only returns the pattern if it does not contain any of the special characters "*"\, "?" or "[". `NOMAGIC' is provided to simplify implementing the historic csh(1) globbing behaviour and should probably not be used anywhere else.
`GLOB_QUOTE' Use the backslash ('\') character for quoting: every occurrence of a backslash followed by a character in the pattern is replaced by that character\, avoiding any special interpretation of the character. (But see below for exceptions on DOSISH systems).
`GLOB_TILDE' Expand patterns that start with '~' to user name home directories.
`GLOB_CSH' For convenience\, `GLOB_CSH' is a synonym for `GLOB_BRACE | GLOB_NOMAGIC | GLOB_QUOTE | GLOB_TILDE'.
The POSIX provided `GLOB_APPEND'\, `GLOB_DOOFFS'\, and the FreeBSD extensions `GLOB_ALTDIRFUNC'\, and `GLOB_MAGCHAR' flags have not been implemented in the Perl version because they involve more complex interaction with the underlying C structures.
DIAGNOSTICS glob() returns a list of matching paths\, possibly zero length. If an error occurred\, &File::Glob::GLOB_ERROR will be non-zero and `$!' will be set. &File::Glob::GLOB_ERROR is guaranteed to be zero if no error occurred\, or one of the following values otherwise:
`GLOB_NOSPACE' An attempt to allocate memory failed.
`GLOB_ABEND' The glob was stopped because an error was encountered.
In the case where glob() has found some matching paths\, but is interrupted by an error\, glob() will return a list of filenames and set &File::Glob::ERROR.
Note that glob() deviates from POSIX and FreeBSD glob(3) behaviour by not considering `ENOENT' and `ENOTDIR' as errors - glob() will continue processing despite those errors\, unless the `GLOB_ERR' flag is set.
Be aware that all filenames returned from File::Glob are tainted.
NOTES * If you want to use multiple patterns\, e.g. `glob "a* b*"'\, you should probably throw them in a set as in `glob "{a*\,b*}'. This is because the argument to glob isn't subjected to parsing by the C shell. Remember that you can use a backslash to escape things.
* On DOSISH systems\, backslash is a valid directory separator character. In this case\, use of backslash as a quoting character (via GLOB_QUOTE) interferes with the use of backslash as a directory separator. The best (simplest\, most portable) solution is to use forward slashes for directory separators\, and backslashes for quoting. However\, this does not match "normal practice" on these systems. As a concession to user expectation\, therefore\, backslashes (under GLOB_QUOTE) only quote the glob metacharacters '['\, ']'\, '{'\, '}'\, '-'\, '~'\, and backslash itself. All other backslashes are passed through unchanged.
* Win32 users should use the real slash. If you really want to use backslashes\, consider using Sarathy's File::DosGlob\, which comes with the standard Perl distribution.
AUTHOR The Perl interface was written by Nathan Torkington \gnat@​frii\.com\, and is released under the artistic license. Further modifications were made by Greg Bacon \gbacon@​cs\.uah\.edu and Gurusamy Sarathy \gsar@​activestate\.com. The C glob code has the following copyright:
Copyright (c) 1989\, 1993 The Regents of the University of California.
All rights reserved.
This code is derived from software contributed to Berkeley by
Guido van Rossum.
Redistribution and use in source and binary forms\, with or without modification\, are permitted provided that the following conditions are met:
1. Redistributions of source code must retain the above copyright notice\, this list of conditions and the following disclaimer. 2. Redistributions in binary form must reproduce the above copyright notice\, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution. 3. Neither the name of the University nor the names of its contributors may be used to endorse or promote products derived from this software without specific prior written permission.
THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES\, INCLUDING\, BUT NOT LIMITED TO\, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT\, INDIRECT\, INCIDENTAL\, SPECIAL\, EXEMPLARY\, OR CONSEQUENTIAL DAMAGES (INCLUDING\, BUT NOT LIMITED TO\, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE\, DATA\, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY\, WHETHER IN CONTRACT\, STRICT LIABILITY\, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE\, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
Tom Christiansen \tchrist@​chthon\.perl\.com wrote
If you just use the standard built-in glob\, then for compatibility with Perl of Old\, it will split on white space because it doesn't know that you aren't asking for more than one fileglob.
That's true about splitting on spaces *within the glob argument*. But David's example
foreach $fn (\<*>) {
doesn't have have spaces in the glob argument\, just in the filename.
Also I don't see the effect David reports. Trying
touch "foo bar baz"
then in the Perl debugger I see
DB\<1> x \<foo*> 0 'foo bar baz' DB\<2>
A list of length one\, not three.
This is with any Perl version up to 5.005_03.
So it seems that HP-UX\, Solaris 2.6 and SunOS4 do the right thing here\, while Linux gets it wrong.
Something to do with the Linux shell?
Mike Guy
Tom Christiansen \tchrist@​chthon\.perl\.com wrote
If you just use the standard built-in glob\, then for compatibility with Perl of Old\, it will split on white space because it doesn't know that you aren't asking for more than one fileglob.
That's true about splitting on spaces *within the glob argument*.
But David's example foreach $fn (\<*>) { doesn't have have spaces in the glob argument\, just in the filename.
Oh\, ok.
Also I don't see the effect David reports.
Neither do I.
So it seems that HP-UX\, Solaris 2.6 and SunOS4 do the right thing here\, while Linux gets it wrong.
No\, I can't reproduce it under "Linux" (whatever that means\, since its not the kernel\, surely) either.
Something to do with the Linux shell?
There is no Linux shell. :-( But here is perl5.005_03 on RedHate 6.0:
DB\<1> open(FH\, ">foo bar blarch") or die; DB\<2> system "ls"\, "-l"\, "foo bar blarch" -rw-r--r-- 1 tchrist root 0 Mar 4 06:58 foo bar blarch DB\<3> @got = \<foo*> DB\<4> x @got 0 'foo bar blarch'
or directly:
redhat% perl -e '@got = \<foo*>; print "got "\, scalar @got' 1
Or more tellingly...
redhat# strace -s 60 -f perl -e '@got = \<foo*>; print "got "\, scalar @got'
fork() = 29368 [pid 29367] close(4) = 0 [pid 29367] fcntl(3\, F_GETFL) = 0 (flags O_RDONLY) [pid 29367] fstat(3\, {st_mode=S_IFIFO|0600\, st_size=0\, ...}) = 0 [pid 29367] mmap(0\, 4096\, PROT_READ|PROT_WRITE\, MAP_PRIVATE|MAP_ANONYMOUS\, -1\, 0 ) = 0x2aabf000 [pid 29367] _llseek(3\, 0\, 0x7ffff3dc\, SEEK_CUR) = -1 ESPIPE (Illegal seek) [pid 29367] fcntl(3\, F_SETFD\, FD_CLOEXEC) = 0 [pid 29367] read(3\, \<unfinished ...> [pid 29368] close(3) = 0 [pid 29368] dup2(4\, 1) = 1 [pid 29368] close(4) = 0 [pid 29368] execve("/bin/sh"\, ["sh"\, "-c"\, "/bin/csh -cf \'set nonomatch; glob foo*\' 2>/dev/null"]\, [/* 35 vars */]) = 0
That gives me an idea!
redhat% touch "foo ' stuff ' can be a problem" redhat% perl -le '@got = \<foo*>; print "got "\, scalar @got' got 2 redhat% touch "foo \\\' stuff \\\\\' can be a problem" redhat% ls -l foo* -rw-r--r-- 1 tchrist tchrist 0 Mar 4 07:15 foo ' stuff ' can be a problem -rw-r--r-- 1 tchrist tchrist 0 Mar 4 07:15 foo \' stuff \\' can be a problem -rw-r--r-- 1 tchrist tchrist 0 Mar 4 07:06 foo bar blarch
redhat% perl5.003 -le '@got = \<foo*>; print "got "\, scalar @got' got 2
Wicked! you *can* screw it up. Well\, with that release. But not with the current one\, using the spiffy new built-in BSD globbing:
redhat% perl5.5.670 -le '@got = \<foo*>; print "got "\, scalar @got' got 3
Interestingly\, I cannot using the same Linux-freaking sequence (or vice versa :-) confuse even older Perl releases (5.004) under OpenBSD\, even older OpenBSDs.
openbsd% ktrace -i perl5.00404 -le '@got = \<foo*>; print "got "\, scalar @got' got 3
openbsd% kdump | grep foo "@got = \<foo*>; print "got "\, scalar @got "@got = \<foo*>; print "got "\, scalar @got 13014 csh NAMI "foo bar blarch" 13014 csh NAMI "foo ' stuff ' can be a problem" 13014 csh NAMI "foo \' stuff \\' can be a problem" "foo ' stuff ' can be a problem\0foo \\' stuff \\\\' can be a problem\0foo ba\ "foo ' stuff ' can be a problem\0foo \\' stuff \\\\' can be a problem\0foo ba\ 19417 perl5.00404 NAMI "foo ' stuff ' can be a problem" 19417 perl5.00404 NAMI "foo \' stuff \\' can be a problem"
I leave you to draw your own conclusions on these various matters.
--tom
On Sat\, 4 Mar 2000\, M.J.T. Guy wrote:
Tom Christiansen \tchrist@​chthon\.perl\.com wrote
So it seems that HP-UX\, Solaris 2.6 and SunOS4 do the right thing here\, while Linux gets it wrong.
Something to do with the Linux shell?
I'm running bash under Linux Mandrake 7.0-2 if that helps an (also running bash on HP-UX). I'm pretty sure it's not the shell though cuz globbing to shell commands works fine (i.e. rm *\, etc). I tried Tom's suggestion and neither interpreter (Linux or HP-UX) could find Glob.pm (don't have enough experience to know how the heck to fix that) and so I couldn't try that to see if it works.
I'll be outa town for a few days and check back on this Thursday.
Mike Guy
David van Balen mailto: vanbalen@mc.edu Box 5054 vanbalen@rocketmail.com Clinton\, MS 39058 http://www.mc.edu/~vanbalen
This issue has been resolved in perl5.6 with the new internal glob stuff.
Migrated from rt.perl.org#2259 (status was 'resolved')
Searchable as RT2259$