Perl / perl5

🐪 The Perl programming language
https://dev.perl.org/perl5/
Other
1.85k stars 527 forks source link

various glob() bugs #1496

Open p5pRT opened 24 years ago

p5pRT commented 24 years ago

Migrated from rt.perl.org#2707 (status was 'open')

Searchable as RT2707$

p5pRT commented 24 years ago

From tchrist@chthon.perl.com

Summary​:

  1. Prototype change breaks documented examples.   2. :globally problems   3. Incompatible breakage   4. Missing docs

+--------------------------------------------------+ | 1. Prototype change breaks documented examples. | +--------------------------------------------------+

I seem to recall that this is my own fault​:

[ 5163] By​: gsar on 2000/02/20 16​:34​:33   Log​: glob() takes one or no user arguments and a non-user-visible second   hidden argument\, fix its prototype-checking accordingly   Branch​: perl

That made this​:

  glob glob ck_glob t@​ S? S? become this   glob glob ck_glob t@​ S?

And that breaks this​:

  use File​::Glob '​:glob';   @​list = glob('*.[ch]');   $homedir = glob('~gnat'\, GLOB_TILDE | GLOB_ERR);   if (GLOB_ERROR) {   print "can't glob ~gnat​: $!\n";   } else {   print "Gnat lives in $homedir\n";   }

Which used to say

  Gnat lives in /home/gnat

But now says

  Too many arguments for glob at - line 3\, near "GLOB_ERR)"

So even though you import it\, and even though it goes into \<*> fileglobs\, you have to say this​:

  use File​::Glob '​:glob';   @​list = glob('*.[ch]');   $homedir = &glob('~gnat'\, GLOB_TILDE | GLOB_ERR);   if (GLOB_ERROR) {   print "can't glob ~gnat​: $!\n";   } else {   print "Gnat lives in $homedir\n";   }

Yes\, you have to use &glob to give another argument.

I don't know a perfect solution here\, but right now\, there's a problem. Well\, yes\, I do know a perfect solution​: if one could (effectively) frob the opcode.pl output so that

  *CORE​::GLOBAL​::glob = \&File​::Glob​::csh_glob;

would make the parser tolerate 0 or 1 arguments\, but with

  *CORE​::GLOBAL​::glob = \&File​::Glob​::glob;

it would tolerate 0\, 1\, or 2 arguments.

+------------------------+ | 2. :globally problems | +------------------------+

There are other problems with this module. The import

  use File​::Glob '​:globally';

is a silent no-op on systems compiled with -DPERL_EXTERNAL_GLOB. That's because all it does is

  *CORE​::GLOBAL​::glob = \&File​::Glob​::csh_glob;

but that's what you have already. If you want to have space-sensitive globbing\, then you use

  use File​::Glob '​:glob';

But that's just that package. You can't use '​:globally' to do a

  *CORE​::GLOBAL​::glob = \&File​::Glob​::glob;

So you have to do it yourself\, which feels sleasy. Here's the demo. You can't get qw/​:glob :globally/ or or qw/​:globally :glob/ to do what you need done. (Yeah\, I know\, CORE​::GLOBAL is "evil".)

  #!/bin/sh -x   rm -rf /tmp/fred "/tmp/fred stuff"   mkdir "/tmp/fred stuff"   touch "/tmp/fred stuff/a"   touch "/tmp/fred stuff/b"   perl -We '   use File​::Glob qw/​:glob :globally/;   #use File​::Glob qw/​:globally :glob/;   # BEGIN { *CORE​::GLOBAL​::glob = \&File​::Glob​::glob };   package bad;   @​a = \</tmp/fred stu*>;   print "File Glob globbed @​a\n"   ';

I think I prefer "​:everywhere" to "​:globally". It's far far too close to "​:glob"\, and means something completely different.

It would also be nice to get at File​::Glob​::glob without overriding the built-in. But then you don't get the flags and such you need. Too bad it's not "POSIX​::glob" -- less typing.

I think you should be able to say whether

  This package uses POSIX glob for the Perl fileglob operator.   This package uses csh glob for the Perl fileglob operator.   All packages use POSIX glob for the Perl fileglob operator.   All packages use csh glob for the Perl fileglob operator.

+----------------------------+ | 3. Incompatible breakage | +----------------------------+

  % perl5.004 -le 'print glob("*.[^x]")'

That gets all the files that end in a dot followed by anything not an x. But it is silently broken now​:

  % perl -le 'print glob("*.[^x]")'

Because of this *.[!x] thing. And there's no way to get back what it used to do.

+-----------------+ | 4. Missing docs | +-----------------+

I notice in passing that csh_glob is not documented. And when it is\, the space bug should be explained. Also\, this $^O oddity isn't explained\, either. Iff you call glob with only one argument\, then iff you're on a case-screwed system (happy unicode\, mate!)\, then you get the default weirdness.

All of these things need doc'ing\, including the various breakages. What you import to get what to happen is highly unclear. It could really use some work.

--tom

p5pRT commented 24 years ago

From [Unknown Contact. See original ticket]

Here's more.

  % touch foo.x foo.X

  % perl -le 'use File​::Glob qw/​:nocase glob/; print join(" "\, glob("*.x"))'   foo.x

  % perl -le 'use File​::Glob qw/​:nocase glob/; print join(" "\, &glob("*.x"))'   foo.x foo.X

It is hard to see how that is an expected feature.

  % perl -le 'use File​::Glob qw/​:nocase :globally/; print join(" "\, glob("*.x"))'   foo.x foo.X

  % perl -le 'use File​::Glob qw/​:nocase :globally glob/; print join(" "\, glob("*.x"))'   foo.x

  % perl -le '{package XXX; use File​::Glob qw/​:nocase :globally/} print join(" "\, glob("*.x"))'   foo.x foo.X

--tom

p5pRT commented 18 years ago

From @smpeters

This is an old bug\, but much of it appears to be fixed. My comments are interspersed within what's below.

[tchrist@​chthon.perl.com - Fri Mar 24 09​:13​:34 2000]​:

Summary​:

1\.  Prototype change breaks documented examples\.
2\.  :globally problems
3\.  Incompatible breakage
4\.  Missing docs

+--------------------------------------------------+ | 1. Prototype change breaks documented examples. | +--------------------------------------------------+

I seem to recall that this is my own fault​:

[ 5163] By​: gsar on 2000/02/20 16​:34​:33 Log​: glob() takes one or no user arguments and a non-user- visible second hidden argument\, fix its prototype-checking accordingly Branch​: perl

That made this​:

glob            glob                    ck\_glob         t@&#8203;      S?

S? become this glob glob ck_glob t@​ S?

And that breaks this​:

 use File&#8203;::Glob '&#8203;:glob';
 @&#8203;list = glob\('\*\.\[ch\]'\);
 $homedir = glob\('~gnat'\, GLOB\_TILDE | GLOB\_ERR\);
 if \(GLOB\_ERROR\) \{
     print "can't glob ~gnat&#8203;: $\!\\n";
 \} else \{
  print "Gnat lives in $homedir\\n";
 \}

Which used to say

Gnat lives in /home/gnat

But now says

Too many arguments for glob at \- line 3\, near "GLOB\_ERR\)"

So even though you import it\, and even though it goes into \<*> fileglobs\, you have to say this​:

 use File&#8203;::Glob '&#8203;:glob';
 @&#8203;list = glob\('\*\.\[ch\]'\);
 $homedir = &glob\('~gnat'\, GLOB\_TILDE | GLOB\_ERR\);
 if \(GLOB\_ERROR\) \{
     print "can't glob ~gnat&#8203;: $\!\\n";
 \} else \{
  print "Gnat lives in $homedir\\n";
 \}

Yes\, you have to use &glob to give another argument.

I don't know a perfect solution here\, but right now\, there's a problem. Well\, yes\, I do know a perfect solution​: if one could (effectively) frob the opcode.pl output so that

\*CORE&#8203;::GLOBAL&#8203;::glob = \\&File&#8203;::Glob&#8203;::csh\_glob;

would make the parser tolerate 0 or 1 arguments\, but with

\*CORE&#8203;::GLOBAL&#8203;::glob = \\&File&#8203;::Glob&#8203;::glob;

it would tolerate 0\, 1\, or 2 arguments.

Magic seems to have been performed somewhere within the bowels. With Perl-5.8.6\, I get...

perl rt_2707_part1.pl Steve lives in /home/steve

+------------------------+ | 2. :globally problems | +------------------------+

There are other problems with this module. The import

 use File&#8203;::Glob '&#8203;:globally';

is a silent no-op on systems compiled with -DPERL_EXTERNAL_GLOB. That's because all it does is

\*CORE&#8203;::GLOBAL&#8203;::glob = \\&File&#8203;::Glob&#8203;::csh\_glob;

but that's what you have already. If you want to have space-sensitive globbing\, then you use

use File&#8203;::Glob '&#8203;:glob';

But that's just that package. You can't use '​:globally' to do a

\*CORE&#8203;::GLOBAL&#8203;::glob = \\&File&#8203;::Glob&#8203;::glob;

So you have to do it yourself\, which feels sleasy. Here's the demo. You can't get qw/​:glob :globally/ or or qw/​:globally :glob/ to do what you need done. (Yeah\, I know\, CORE​::GLOBAL is "evil".)

\#\!/bin/sh \-x
rm \-rf /tmp/fred "/tmp/fred stuff"
mkdir "/tmp/fred stuff"
touch "/tmp/fred stuff/a"
touch "/tmp/fred stuff/b"
perl \-We '
use File&#8203;::Glob qw/&#8203;:glob :globally/;
\#use File&#8203;::Glob qw/&#8203;:globally :glob/;
\# BEGIN \{ \*CORE&#8203;::GLOBAL&#8203;::glob = \\&File&#8203;::Glob&#8203;::glob \};
package bad;
@&#8203;a = \</tmp/fred stu\*>;
print "File Glob globbed @&#8203;a\\n"
';

I think I prefer "​:everywhere" to "​:globally". It's far far too close to "​:glob"\, and means something completely different.

It would also be nice to get at File​::Glob​::glob without overriding the built-in. But then you don't get the flags and such you need. Too bad it's not "POSIX​::glob" -- less typing.

I think you should be able to say whether

This package uses POSIX glob for the Perl fileglob operator\.
This package uses csh   glob for the Perl fileglob operator\.
All packages use  POSIX glob for the Perl fileglob operator\.
All packages use  csh   glob for the Perl fileglob operator\.

File​::Glob and its funkiness has been embedded within Perl long enough like this that changing it would break backwards compatibility. Maybe one solution would be to implement a pragma or hint to clean it up. So\,

  use glob 'POSIX'; # Explicitly take the POSIX behavior\, or   use glob 'csh'; # Explicitly take the csh behavior

For the global usage (although I personally believe this could break things)...

  use glob qw(POSIX everywhere); # Explicitly take the POSIX behavior globally\, or   use glob qw(csh everywhere); # Explicitly take the csh behavior globally

This is mostly an incomplete thought\, but I'd be happy to listen to suggestions\, criticisms\, complaints\, etc. But\, this looks like even less typing :)

+----------------------------+ | 3. Incompatible breakage | +----------------------------+

% perl5\.004 \-le 'print glob\("\*\.\[^x\]"\)'

That gets all the files that end in a dot followed by anything not an x. But it is silently broken now​:

% perl \-le 'print glob\("\*\.\[^x\]"\)'

Because of this *.[!x] thing. And there's no way to get back what it used to do.

That's due to a POSIX change (see http​://www.opengroup.org/onlinepubs/009695399/utilities/xcu_chap02.html#tag_02_13_01) for an explanation.) Maybe we could support both to do the same\, but since its been this way now for five years\, do we need too?

+-----------------+ | 4. Missing docs | +-----------------+

I notice in passing that csh_glob is not documented. And when it is\, the space bug should be explained. Also\, this $^O oddity isn't explained\, either. Iff you call glob with only one argument\, then iff you're on a case-screwed system (happy unicode\, mate!)\, then you get the default weirdness.

All of these things need doc'ing\, including the various breakages. What you import to get what to happen is highly unclear. It could really use some work.

The current description contains the following.

"The glob angle-bracket operator \<> is a pathname generator that implements the rules for file name pattern matching used by Unix-like shells such as the Bourne shell or C shell.

File​::Glob​::bsd_glob() implements the FreeBSD glob(3) routine\, which is a superset of the POSIX glob() (described in IEEE Std 1003.2 "POSIX.2"). bsd_glob() takes a mandatory pattern argument\, and an optional flags argument\, and returns a list of filenames matching the pattern\, with interpretation of the pattern modified by the flags variable.

Since v5.6.0\, Perl's CORE​::glob() is implemented in terms of bsd_glob(). Note that they don't share the same prototype--CORE​::glob() only accepts a single argument. Due to historical reasons\, CORE​::glob() will also split its argument on whitespace\, treating it as multiple patterns\, whereas bsd_glob() considers them as one pattern."

As far as documenting csh_glob\, its documented internally with the following comment.

  "csh_glob() should not be used directly\, unless you know what you're doing."

I'm not sure why this is\, but this makes it difficult for me to suggest documenting it. There is also a note that a nice thing to do would be to create a flag to avoid the default space handling behavior (reading this\, though\, is giving me what I need to close another more bug). That will take a bit of thought on how best to implement it (a hint\, perhaps? Sorry I wrote this before what I wrote above\, but it could be a separate tweak all on its own). Overall\, I think I'll take a deeper look into the docs to see if more clarity is needed. Perhaps\, a quick example in the synopsis would help.

The problems encountered by Mac OS Classic users has been documented\, as well as issues for Win32 users. Mac OS X is also case-insensative. A small note that essentially says "we do what your libc glob wants us to do" would probably be nice.

Overall\, though\, things seem better than the world of File​::Glob at the time you opened this ticket\, but I agree that some things still need to be looked into further.