Perl / perl5

🐪 The Perl programming language
https://dev.perl.org/perl5/
Other
1.9k stars 540 forks source link

File::Glob behaviour using GLOB_BRACE | GLOB_NOCHECK #7399

Closed p5pRT closed 11 years ago

p5pRT commented 20 years ago

Migrated from rt.perl.org#30553 (status was 'resolved')

Searchable as RT30553$

p5pRT commented 20 years ago

From j.g.karssenberg@student.utwente.nl

Created by pardus@cpan

The following script​:

  use File​::Glob '​:glob';

  my @​test = (   '{random string}'\,   '\\{random string\\}'\,   '{random\,string}'   );

  for (@​test) {   print "glob​: $_ ==> ";   print join '\, '\, bsd_glob($_\, GLOB_BRACE | GLOB_NOCHECK);   print "\n";   }

Produces the following output​:

  glob​: {random string} ==> random string   glob​: \{random string\} ==> \random string\   glob​: {random\,string} ==> random\, string

I believe this to be wrong\, because since I have no file called 'random string' and the GLOB_NOCHECK option is in effect the _original_ pattern should be returned. But I realise some people might consider it a feature. The point is that it results in unexpected behaviour\, I understand how this "expansion" happens\, but I'm afraid my users will not.

I noticed a similar behaviour in bash ( suggesting that this is a feature of the underlying C-library (?) ) but there it only happens when there is at least one '\,' between the braces. This is an arbitrary rule\, but feels all ready more heuristic then the current behaviour of File​::Glob.

Please let me know if this is considered a feature instead of a bug\, in that case I will write a high-level wrapper/workaround and put it on cpan as a subclass of File​::Glob.

--   ) ( Jaap Karssenberg || Pardus [Larus] | |0| |   : : http​://pardus-larus.student.utwente.nl/~pardus | | |0| ) \ / ( |0|0|0| "\,.*'*.\," Proud owner of "Perl6 Essentials" 1st edition :) wannabe

Perl Info ``` Flags: category=library severity=low Site configuration information for perl v5.8.2: Configured by root at Thu Dec 11 01:59:13 CET 2003. Summary of my perl5 (revision 5.0 version 8 subversion 2) configuration: Platform: osname=linux, osvers=2.4.20-gentoo-r8, archname=i686-linux uname='linux captain 2.4.20-gentoo-r8 #4 sun nov 16 19:43:54 cet 2003 i686 amd duron(tm) authenticamd gnulinux ' config_args='-des -Darchname=i686-linux -Dcccdlflags=-fPIC -Dccdlflags=-rdynamic -Dcc=gcc -Dprefix=/usr -Dvendorprefix=/usr -Dsiteprefix=/usr -Dlocincpth= -Doptimize=-march=athlon -O3 -pipe -mmmx -m3dnow -Duselargefiles -Dd_dosuid -Dd_semctl_semun -Dscriptdir=/usr/bin -Dman3ext=3pm -Dcf_by=Gentoo -Ud_csh -Di_gdbm -Di_db -Di_ndbm' hint=recommended, useposix=true, d_sigaction=define usethreads=undef use5005threads=undef useithreads=undef usemultiplicity=undef useperlio=define d_sfio=undef uselargefiles=define usesocks=undef use64bitint=undef use64bitall=undef uselongdouble=undef usemymalloc=n, bincompat5005=undef Compiler: cc='gcc', ccflags ='-fno-strict-aliasing -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64', optimize='-march=athlon -O3 -pipe -mmmx -m3dnow', cppflags='-DPERL5 -fno-strict-aliasing' ccversion='', gccversion='3.3.2 20031022 (Gentoo Linux 3.3.2-r2, propolice)', gccosandvers='' intsize=4, longsize=4, ptrsize=4, doublesize=8, byteorder=1234 d_longlong=define, longlongsize=8, d_longdbl=define, longdblsize=12 ivtype='long', ivsize=4, nvtype='double', nvsize=8, Off_t='off_t', lseeksize=8 alignbytes=4, prototype=define Linker and Libraries: ld='gcc', ldflags =' -L/usr/local/lib' libpth=/usr/local/lib /lib /usr/lib libs=-lpthread -lnsl -lndbm -lgdbm -ldb -ldl -lm -lcrypt -lutil -lc perllibs=-lpthread -lnsl -ldl -lm -lcrypt -lutil -lc libc=/lib/libc-2.3.2.so, so=so, useshrplib=false, libperl=libperl.a gnulibc_version='2.3.2' Dynamic Linking: dlsrc=dl_dlopen.xs, dlext=so, d_dlsymun=undef, ccdlflags='-rdynamic' cccdlflags='-fPIC', lddlflags='-shared -L/usr/local/lib' Locally applied patches: @INC for perl v5.8.2: /etc/perl /usr/lib/perl5/site_perl/5.8.2/i686-linux /usr/lib/perl5/site_perl/5.8.2 /usr/lib/perl5/site_perl/5.8.1/i686-linux /usr/lib/perl5/site_perl/5.8.1 /usr/lib/perl5/site_perl/5.8.0/i686-linux /usr/lib/perl5/site_perl/5.8.0 /usr/lib/perl5/site_perl /usr/lib/perl5/vendor_perl/5.8.2/i686-linux /usr/lib/perl5/vendor_perl/5.8.2 /usr/lib/perl5/vendor_perl /usr/lib/perl5/5.8.2/i686-linux /usr/lib/perl5/5.8.2 /usr/local/lib/site_perl /usr/lib/perl5/site_perl/5.8.1/i686-linux /usr/lib/perl5/site_perl/5.8.1 /usr/lib/perl5/site_perl/5.8.0/i686-linux /usr/lib/perl5/site_perl/5.8.0 . Environment for perl v5.8.2: HOME=/home/pardus LANG (unset) LANGUAGE (unset) LD_LIBRARY_PATH (unset) LOGDIR (unset) PATH=/bin:/usr/bin:/usr/local/bin:/opt/bin:/usr/i686-pc-linux-gnu/gcc-b in/3.3:/usr/X11R6/bin:/opt/blackdown-jdk-1.4.1/bin:/opt/blackdown-jdk-1 .4.1/jre/bin:/usr/games/bin:/usr/local/apps:apps:apps PERL_BADLANG (unset) PERL_RL=Zoid SHELL=/bin/sh ```
p5pRT commented 20 years ago

From @rgs

Jaap Karssenberg (via RT) wrote​:

The following script​:

use File​::Glob '​:glob';

my @​test = ( '{random string}'\, '\\{random string\\}'\, '{random\,string}' );

for (@​test) { print "glob​: $_ ==> "; print join '\, '\, bsd_glob($_\, GLOB_BRACE | GLOB_NOCHECK); print "\n"; }

Produces the following output​:

glob​: {random string} ==> random string glob​: \{random string\} ==> \random string\ glob​: {random\,string} ==> random\, string

I believe this to be wrong\, because since I have no file called 'random string'

Brace expansion (GLOB_BRACE) is not related to file existence

and the GLOB_NOCHECK option is in effect the _original_ pattern should be returned.

GLOB_NOCHECK deals with *\, ? and []\, not with {} (as I understand it)

But I realise some people might consider it a feature. The point is that it results in unexpected behaviour\, I understand how this "expansion" happens\, but I'm afraid my users will not.

That's a good point.

I noticed a similar behaviour in bash ( suggesting that this is a feature of the underlying C-library (?) ) but there it only happens when there is at least one '\,' between the braces. This is an arbitrary rule\, but feels all ready more heuristic then the current behaviour of File​::Glob.

Actually File​::Glob uses an implementation of glob() grabbed from OpenBSD. I don't know what bash uses.

Please let me know if this is considered a feature instead of a bug\, in that case I will write a high-level wrapper/workaround and put it on cpan as a subclass of File​::Glob.

Given that the current behaviour matches my mental model of globbing and that it's consistent with what the BSD do\, I think it's considered a feature. (Not mentioning backward compatibility reasons :)

In this case I think the docs can be made clearer. Apparently the C-level documentation of glob() is even more cryptic that the one of File​::Glob; if anybody can suggests improvements...

p5pRT commented 20 years ago

The RT System itself - Status changed from 'new' to 'open'

p5pRT commented 20 years ago

From j.g.karssenberg@student.utwente.nl

On 1 Jul 2004 12​:57​:22 -0000 Rafael Garcia-Suarez via RT wrote​: : Given that the current behaviour matches my mental model of globbing : and that it's consistent with what the BSD do\, I think it's considered : a feature. (Not mentioning backward compatibility reasons :) : : In this case I think the docs can be made clearer. Apparently the : C-level documentation of glob() is even more cryptic that the one of : File​::Glob; if anybody can suggests improvements...

What would it take to have a GLOB_BRACE_NOCHECK option ? The functionality is rather straight forward in perl code using the current bsd_glob method. Only needs an extra constant and one subroutine to wrap the bsd_glob function when the option is in effect. I was thinking about writing a File​::Glob​::BRACE_NOCHECK module\, but the feature seems to small for a separate module. If you feel comfortable with an extra option I'll submit the wrapper code.

The documentation at least could say something like "This option has no effect on the brace expansion done when GLOB_BRACE used." in the text about GLOB_NOCHECK.

--   ) ( Jaap Karssenberg || Pardus [Larus] | |0| |   : : http​://pardus-larus.student.utwente.nl/~pardus | | |0| ) \ / ( |0|0|0| "\,.*'*.\," Proud owner of "Perl6 Essentials" 1st edition :) wannabe

p5pRT commented 20 years ago

From @rgs

Jaap Karssenberg wrote​:

On 1 Jul 2004 12​:57​:22 -0000 Rafael Garcia-Suarez via RT wrote​: : In this case I think the docs can be made clearer. Apparently the : C-level documentation of glob() is even more cryptic that the one of : File​::Glob; if anybody can suggests improvements...

What would it take to have a GLOB_BRACE_NOCHECK option ? The functionality is rather straight forward in perl code using the current bsd_glob method. Only needs an extra constant and one subroutine to wrap the bsd_glob function when the option is in effect. I was thinking about writing a File​::Glob​::BRACE_NOCHECK module\, but the feature seems to small for a separate module. If you feel comfortable with an extra option I'll submit the wrapper code.

Well\, since File​::Glob is merely a wrapper around the BSD glob\, taken from their sources\, I am reluctant to modify it. Standards\, and all that.

The documentation at least could say something like "This option has no effect on the brace expansion done when GLOB_BRACE used." in the text about GLOB_NOCHECK.

I prefer this solution.

p5pRT commented 20 years ago

From lupe@lupe-christoph.de

On Friday\, 2004-07-02 at 15​:27​:57 +0200\, Rafael Garcia-Suarez wrote​:

Well\, since File​::Glob is merely a wrapper around the BSD glob\, taken from their sources\, I am reluctant to modify it. Standards\, and all that.

Which also means that glob in Perl behaves exactly like glob in C. Which is the way one would expect it to behave. I would rather surprised if this in bash or csh   echo {foo\,FOO}{bar\,BAR} would not echo   foobar fooBAR FOObar FOOBAR But tell me files by that name did not exist. Imagine I want to do   mkdir {foo\,FOO}{bar\,BAR}

I admit that glob in Perl is not used for this\, but still\, why should it behave differently?

The documentation at least could say something like "This option has no effect on the brace expansion done when GLOB_BRACE used." in the text about GLOB_NOCHECK.

I prefer this solution.

"To maintain backwards compatibility with the globbing done by shells\, Perl glob will treat braces in the same way." or somesuch.

Lupe Christoph -- | lupe@​lupe-christoph.de | http​://www.lupe-christoph.de/ | | "... putting a mail server on the Internet without filtering is like | | covering yourself with barbecue sauce and breaking into the Charity | | Home for Badgers with Rabies. Michael Lucas |

p5pRT commented 20 years ago

From @rgs

Lupe Christoph wrote​:

"To maintain backwards compatibility with the globbing done by shells\, Perl glob will treat braces in the same way." or somesuch.

That sounds a bit like saying that humans maintain backward compatibility with trilobites. There has been quite a lot of evolution since then ;)

p5pRT commented 20 years ago

From j.g.karssenberg@student.utwente.nl

On 2 Jul 2004 14​:16​:49 -0000 Lupe Christoph via RT wrote​: : I admit that glob in Perl is not used for this\, but still\, why should : it behave differently?

Yes it is\, thats why I'm bugging you in the first place :) I use the Zoidberg shell (http​://zoidberg.sf.net) for my login shell.

: "To maintain backwards compatibility with the globbing done by shells\, : Perl glob will treat braces in the same way." or somesuch.

Yeah\, but may I ask compatibility with which shell? As I said the bash behaviour is slightly different then the current perl implementation.

tcsh> echo {foobar} foobar tcsh> echo {foo\,bar} foo bar bash> echo {foobar} {foobar} bash> echo {foo\,bar} foo bar

I suppose tcsh is closest to the original implementation\, and it is a more consequent implementation\, but the hack bash introduces seems more useful in daily live. Also I brace expansion is _not_ specified by POSIX\, so *real* compatible shells don't even support it :S Anyway\, the point being that you can't really claim compatibility since there is no real standard.

But since it is understood to be a feature\, I will leave it to you and I'll make zoidberg compliant with the bash behaviour\, because that feels the most heuristic.

--   ) ( Jaap Karssenberg || Pardus [Larus] | |0| |   : : http​://pardus-larus.student.utwente.nl/~pardus | | |0| ) \ / ( |0|0|0| "\,.*'*.\," Proud owner of "Perl6 Essentials" 1st edition :) wannabe

p5pRT commented 20 years ago

From lupe@lupe-christoph.de

On Friday\, 2004-07-02 at 16​:16​:50 +0200\, Rafael Garcia-Suarez wrote​:

Lupe Christoph wrote​:

"To maintain backwards compatibility with the globbing done by shells\, Perl glob will treat braces in the same way." or somesuch.

That sounds a bit like saying that humans maintain backward compatibility with trilobites. There has been quite a lot of evolution since then ;)

IIRC trilobites *had* shells... And bash/csh still behave that way. So at least I\, being a trilobyte ;-) shell user\, expect that behaviour. Law of least astonishment. Don't surprise your users.

Lupe Christoph -- | lupe@​lupe-christoph.de | http​://www.lupe-christoph.de/ | | "... putting a mail server on the Internet without filtering is like | | covering yourself with barbecue sauce and breaking into the Charity | | Home for Badgers with Rabies. Michael Lucas |

p5pRT commented 20 years ago

From lupe@lupe-christoph.de

On Friday\, 2004-07-02 at 16​:53​:44 +0200\, Jaap Karssenberg wrote​:

On 2 Jul 2004 14​:16​:49 -0000 Lupe Christoph via RT wrote​: : I admit that glob in Perl is not used for this\, but still\, why should : it behave differently?

Yes it is\, thats why I'm bugging you in the first place :) I use the Zoidberg shell (http​://zoidberg.sf.net) for my login shell.

Maybe we need a tuneable that tells glob which shell it shall simulate ;-)

: "To maintain backwards compatibility with the globbing done by shells\, : Perl glob will treat braces in the same way." or somesuch.

Yeah\, but may I ask compatibility with which shell? As I said the bash behaviour is slightly different then the current perl implementation.

tcsh> echo {foobar} foobar tcsh> echo {foo\,bar} foo bar bash> echo {foobar} {foobar} bash> echo {foo\,bar} foo bar

I suppose tcsh is closest to the original implementation\, and it is a more consequent implementation\, but the hack bash introduces seems more useful in daily live. Also I brace expansion is _not_ specified by POSIX\, so *real* compatible shells don't even support it :S Anyway\, the point being that you can't really claim compatibility since there is no real standard.

IIRC\, only the Bourne shell has been spec'd by POSIX. And tcsh\, being a derivative of the original csh (again IIRC)\, behave like it. I just tested that on Solaris with it's csh.

Anyway\, your example above is a borderline case because the contrsuct is *meant* to take lists. And none of them takes existing files in regard.

But since it is understood to be a feature\, I will leave it to you and I'll make zoidberg compliant with the bash behaviour\, because that feels the most heuristic.

Whatever you feel like. I can't decide what behaviour with single-element lists is better. I'd have expected the t?csh behaviour\, though.

My point was that taking existing files into account makes glob behave less like I would expect it to. But them I never use glob. Always something like   grep /^(?​:foo|FOO)(?​:bar|BAR)$/ readdir

Lupe Christoph -- | lupe@​lupe-christoph.de | http​://www.lupe-christoph.de/ | | "... putting a mail server on the Internet without filtering is like | | covering yourself with barbecue sauce and breaking into the Charity | | Home for Badgers with Rabies. Michael Lucas |

p5pRT commented 11 years ago

From @jkeenan

On Thu Jul 01 03​:59​:50 2004\, j.g.karssenberg@​student.utwente.nl wrote​:

This is a bug report for perl from pardus@​cpan\, generated with the help of perlbug 1.34 running under perl v5.8.2.

----------------------------------------------------------------- [Please enter your report here]

The following script​:

use File​::Glob '​:glob';

my @​test = ( '{random string}'\, '\\{random string\\}'\, '{random\,string}' );

for (@​test) { print "glob​: $_ ==> "; print join '\, '\, bsd_glob($_\, GLOB_BRACE | GLOB_NOCHECK); print "\n"; }

Produces the following output​:

glob​: {random string} ==> random string glob​: \{random string\} ==> \random string\ glob​: {random\,string} ==> random\, string

I believe this to be wrong\, because since I have no file called 'random string' and the GLOB_NOCHECK option is in effect the _original_ pattern should be returned. But I realise some people might consider it a feature. The point is that it results in unexpected behaviour\, I understand how this "expansion" happens\, but I'm afraid my users will not.

I noticed a similar behaviour in bash ( suggesting that this is a feature of the underlying C-library (?) ) but there it only happens when there is at least one '\,' between the braces. This is an arbitrary rule\, but feels all ready more heuristic then the current behaviour of File​::Glob.

Please let me know if this is considered a feature instead of a bug\, in that case I will write a high-level wrapper/workaround and put it on cpan as a subclass of File​::Glob.

There was a fair amount of back-and-forth about this eight years ago. My sense from Rafael's comments is that he thought no change from current behavior was needed.

Is there anyone who could review the issues and make a recommendation either for a behavior change\, for a documentation change or for closing the ticket?

Thank you very much. Jim Keenan

p5pRT commented 11 years ago

From @jkeenan

On Sat Sep 29 19​:06​:01 2012\, jkeenan wrote​:

On Thu Jul 01 03​:59​:50 2004\, j.g.karssenberg@​student.utwente.nl wrote​:

This is a bug report for perl from pardus@​cpan\, generated with the help of perlbug 1.34 running under perl v5.8.2.

----------------------------------------------------------------- [Please enter your report here]

The following script​:

use File​::Glob '​:glob';

my @​test = ( '{random string}'\, '\\{random string\\}'\, '{random\,string}' );

for (@​test) { print "glob​: $_ ==> "; print join '\, '\, bsd_glob($_\, GLOB_BRACE | GLOB_NOCHECK); print "\n"; }

Produces the following output​:

glob​: {random string} ==> random string glob​: \{random string\} ==> \random string\ glob​: {random\,string} ==> random\, string

I believe this to be wrong\, because since I have no file called 'random string' and the GLOB_NOCHECK option is in effect the _original_ pattern should be returned. But I realise some people might consider it a feature. The point is that it results in unexpected behaviour\, I understand how this "expansion" happens\, but I'm afraid my users will not.

I noticed a similar behaviour in bash ( suggesting that this is a feature of the underlying C-library (?) ) but there it only happens when there is at least one '\,' between the braces. This is an arbitrary rule\, but feels all ready more heuristic then the current behaviour of File​::Glob.

Please let me know if this is considered a feature instead of a bug\, in that case I will write a high-level wrapper/workaround and put it on cpan as a subclass of File​::Glob.

There was a fair amount of back-and-forth about this eight years ago. My sense from Rafael's comments is that he thought no change from current behavior was needed.

Is there anyone who could review the issues and make a recommendation either for a behavior change\, for a documentation change or for closing the ticket?

Thank you very much. Jim Keenan

Since there has been no response since September\, I am closing the ticket.

Thank you very much. Jim Keenan

p5pRT commented 11 years ago

@jkeenan - Status changed from 'open' to 'resolved'