Closed p5pRT closed 10 years ago
Cc: strawberry-perl@project Subject: ââ Abbreviation of `glob` and `readline` Message-Id: \5\.16\.2\_16700\_1394233908@​Samurai Reply-To: the.rob.dixon@gmail.co ââ m To: perlbug@perl.org From: the.rob.dixon@gmail.com
This is a bug report for perl from the.rob.dixon@gmail.com\, generated with the help of perlbug 1.39 running under perl 5.16.2.
-----------------------------------------------------------------
I believe that it is long-overdue for Perl's `\<>`\, `\
There is clearly the issue of backward compatibility\, and of the sticklers that will want to pretend that Perl is really their favourite shell language. But I believe we should start to discourage the shorthand\, and that is why I have submitted this bug under the documentation category
On Fri\, 7 Mar 2014 15:21:41 -0800\, Rob Dixon (via RT) \perlbug\-followup@​perl\.org wrote:
I believe that it is long-overdue for Perl's `\<>`\, `\
`\, `\<*globtext*>` to be discouraged. There is nothing wrong with `readline` and `glob`\, and programs are better written that way.
I agree with \
\<foo.* bar.* *.[ao] >
reads a whole lot easier than the alternatives
(glob ("foo.*")\, glob ("bar.*")\, glob ("*.[ao])) or map { glob ($_) } qw( foo.* bar.* *.[ao]
I partly agree with \
I completely disagree with deprecation of \<> which is what makes perl perl
There is clearly the issue of backward compatibility\, and of the sticklers that will want to pretend that Perl is really their favourite shell language. But I believe we should start to discourage the shorthand\, and that is why I have submitted this bug under the documentation category
-- H.Merijn Brand http://tux.nl Perl Monger http://amsterdam.pm.org/ using perl5.00307 .. 5.19 porting perl5 on HP-UX\, AIX\, and openSUSE http://mirrors.develooper.com/hpux/ http://www.test-smoke.org/ http://qa.perl.org http://www.goldmark.org/jeff/stupid-disclaimers/
The RT System itself - Status changed from 'new' to 'open'
On Mon\, Mar 10\, 2014 at 12:12 PM\, H.Merijn Brand \h\.m\.brand@​xs4all\.nlwrote:
On Fri\, 7 Mar 2014 15:21:41 -0800\, Rob Dixon (via RT) \perlbug\-followup@​perl\.org wrote:
I believe that it is long-overdue for Perl's `\<>`\, `\
`\, `\<*globtext*>` to be discouraged. There is nothing wrong with `readline` and `glob`\, and programs are better written that way. I agree with \
better be written as glob ("globtext")\, though writing \<foo.* bar.* *.[ao] >
reads a whole lot easier than the alternatives
(glob ("foo.*")\, glob ("bar.*")\, glob ("*.[ao])) or map { glob ($_) } qw( foo.* bar.* *.[ao]
glob q\<foo.* bar.* *.[ao]>
On Fri Mar 07 15:21:41 2014\, the.rob.dixon@gmail.com wrote: [snip]
I believe that it is long-overdue for Perl's `\<>`\, `\
`\, `\<*globtext*>` to be discouraged. There is nothing wrong with `readline` and `glob`\, and programs are better written that way.
-1
I sympathize\, but not enough to agree.
-- rjbs
@rjbs - Status changed from 'open' to 'rejected'
H.Merijn Brand \<h.m.brand \
writing
\<foo.* bar.* *.[ao] >
reads a whole lot easier than the alternatives
(glob ("foo.*")\, glob ("bar.*")\, glob ("*.[ao]))
Actually\, glob('*.c *.h') will split on spaces and glob as two patterns. I think that is a bug. It causes all sorts of code like
glob("$dir/*.txt")
to be buggy and break whenever $dir contains spaces - which even the most wizened Unix greybeard must admit are nowadays common. But that's what it currently does.
I would certainly support that glob(pattern) change to not split at spaces\, perhaps keeping the older \<*.c *.h> syntax splitting for compatibility.
-- Ed Avis \eda@​waniasset\.com
On Tue\, 11 Mar 2014 09:00:31 +0000 (UTC)\, Ed Avis \eda@​waniasset\.com wrote:
H.Merijn Brand \<h.m.brand \
xs4all.nl> writes: writing
\<foo.* bar.* *.[ao] >
reads a whole lot easier than the alternatives
(glob ("foo.*")\, glob ("bar.*")\, glob ("*.[ao]))
Actually\, glob('*.c *.h') will split on spaces and glob as two patterns. I think that is a bug. It causes all sorts of code like
glob\("$dir/\*\.txt"\)
to be buggy and break whenever $dir contains spaces - which even the most wizened Unix greybeard must admit are nowadays common. But that's what it currently does.
I would certainly support that glob(pattern) change to not split at spaces\, perhaps keeping the older \<*.c *.h> syntax splitting for compatibility.
+1
-- H.Merijn Brand http://tux.nl Perl Monger http://amsterdam.pm.org/ using perl5.00307 .. 5.19 porting perl5 on HP-UX\, AIX\, and openSUSE http://mirrors.develooper.com/hpux/ http://www.test-smoke.org/ http://qa.perl.org http://www.goldmark.org/jeff/stupid-disclaimers/
Ed Avis writes:
glob('*.c *.h') will split on spaces and glob as two patterns. I think that is a bug. It causes all sorts of code like
glob\("$dir/\*\.txt"\)
to be buggy and break whenever $dir contains spaces
I agree it's turned out to be a poor interface\, having been caught out by exactly this feature recently.
I would certainly support that glob(pattern) change to not split at spaces
That would break programs relying on the current behaviour. That seems unnecessarily harsh\, swapping an irritation for current developers for users of programs written years ago suddenly suffering breakage.
perldoc -f glob already has a note pointing this behaviour out\, and points out that File::Glob provides bsd_glob without this misfeature.
Smylers -- http://twitter.com/Smylers2
Perhaps we can agree that a warning should be issued when the pattern string given to glob() contains a space?
The 'weak' warning would be to warn on glob("$dir/foo.*") if\, at runtime\, the pattern string contains a space. But not to warn for glob('*.c *.h') where the pattern string is known at compile time to contain a space - showing the programmer's deliberate intention to match two patterns.
The alternative is a 'strong' warning which fires whenever the pattern contains spaces.
The intention of the 'weak' warning is that the perl developers intend to keep the current semantics of glob()\, but flag cases where it may be used wrongly.
The intention of the 'strong' warning is to flag buggy code\, but also to prepare for a future version of perl where the semantics of glob() change.
Which would folk here prefer?
I also think that glob() should change to accept multiple arguments\, so you can say glob('*.c'\, '*.h'). This would not break any compatibility since they are currently disallowed. Most likely\, the multi-arg form would never split at spaces. That means you could safely say
glob("$dir/foo.*"\, '')
Ugly\, but workable until such future date (if any) when single-arg glob() changes to not split at spaces.
-- Ed Avis \eda@​waniasset\.com
Smylers \<Smylers \
I would certainly support that glob(pattern) change to not split at spaces
That would break programs relying on the current behaviour. That seems unnecessarily harsh\, swapping an irritation for current developers for users of programs written years ago suddenly suffering breakage.
On the other hand\, there are plenty of programs written years ago with unsafe constructs like glob("$dir/*.txt"). These are buggy now\, but would start to work properly if the semantics of glob() were changed. So weighed against the programs suddenly suffering breakage are many suffering 'mendage'.
Perhaps as a halfway house the glob() semantics could still split if the pattern is a fixed string at compile time\, but not if it is pasted together at run time. So glob('*.c *.h') would still split but glob($x) would not. Unpleasant but perhaps the best way forward.
-- Ed Avis \eda@​waniasset\.com
On Tue\, 11 Mar 2014 11:20:02 +0000 (UTC)\, Ed Avis \eda@​waniasset\.com wrote:
Perhaps we can agree that a warning should be issued when the pattern string given to glob() contains a space?
The 'weak' warning would be to warn on glob("$dir/foo.*") if\, at runtime\, the pattern string contains a space. But not to warn for glob('*.c *.h') where the pattern string is known at compile time to contain a space - showing the programmer's deliberate intention to match two patterns.
do you also want a distinction between
glob ($pattern) and glob ("$pattern") ?
The alternative is a 'strong' warning which fires whenever the pattern contains spaces.
The intention of the 'weak' warning is that the perl developers intend to keep the current semantics of glob()\, but flag cases where it may be used wrongly.
+1
The intention of the 'strong' warning is to flag buggy code\, but also to prepare for a future version of perl where the semantics of glob() change.
Which would folk here prefer?
That people would never use file/folder names with spaces :) (or database table/field names with spaces\, slashes\, colons\, semicolons\, minus signs\, dollars or whatever some imbecile database allows them to use)
I also think that glob() should change to accept multiple arguments\, so you can say glob('*.c'\, '*.h').
+10!
This would not break any compatibility since they are currently disallowed. Most likely\, the multi-arg form would never split at spaces. That means you could safely say
glob\("$dir/foo\.\*"\, ''\)
undef would be better than '' here
Ugly\, but workable until such future date (if any) when single-arg glob() changes to not split at spaces.
-- H.Merijn Brand http://tux.nl Perl Monger http://amsterdam.pm.org/ using perl5.00307 .. 5.19 porting perl5 on HP-UX\, AIX\, and openSUSE http://mirrors.develooper.com/hpux/ http://www.test-smoke.org/ http://qa.perl.org http://www.goldmark.org/jeff/stupid-disclaimers/
2014-03-11 17:23 GMT+04:00 H.Merijn Brand \h\.m\.brand@​xs4all\.nl:
That people would never use file/folder names with spaces :) (or database table/field names with spaces\, slashes\, colons\, semicolons\, minus signs\, dollars or whatever some imbecile database allows them to use)
I think need to distinct programmers and application users here. Applications might need to work with user supplied paths\, and users always will have pathnames with spaces. Example with database is a bit different thing here.
On Tue\, 11 Mar 2014 17:28:51 +0400\, Victor Efimov \victor@​vsespb\.ru wrote:
2014-03-11 17:23 GMT+04:00 H.Merijn Brand \h\.m\.brand@​xs4all\.nl:
That people would never use file/folder names with spaces :) (or database table/field names with spaces\, slashes\, colons\, semicolons\, minus signs\, dollars or whatever some imbecile database allows them to use)
I think need to distinct programmers and application users here. Applications might need to work with user supplied paths\, and users always will have pathnames with spaces. Example with database is a bit different thing here.
you stripped the context:
Which would folk here prefer?
seen the smiley? Yes I know that people will make our lives miserable by having default setups doing stuff we don't like. That includes allowing spaces in folder and document names or having case-aware file systems.
But all of that is not what the focus of glob () is.
-- H.Merijn Brand http://tux.nl Perl Monger http://amsterdam.pm.org/ using perl5.00307 .. 5.19 porting perl5 on HP-UX\, AIX\, and openSUSE http://mirrors.develooper.com/hpux/ http://www.test-smoke.org/ http://qa.perl.org http://www.goldmark.org/jeff/stupid-disclaimers/
Filenames with spaces are a fact of life on the majority of systems. In 1995 Windows started using "C:\Program Files"\, and ActivePerl at least started installing Perl under that location. It is shocking that nearly two decades later there are still straightforward bugs in handling such filenames.
That doesn't detract from the need for backwards compatibility and not to break code - but please let's not go off on a side track of "get users to stop putting spaces in filenames". Spaces in filenames exist. Perl has to deal with them. Easy things should be easy.
-- Ed Avis \eda@​waniasset\.com
H.Merijn Brand \<h.m.brand \
do you also want a distinction between
glob ($pattern) and glob ("$pattern")
No I don't think so. What I suggest is two rules:
- At run time\, if the pattern contains a space\, warn.
- *But* if at compile time the pattern is a fixed string (not a double- quoted string with variables) and that fixed string contains spaces\, then suppress the runtime warning.
That gives the 'weak' setup. The 'strong' setup is to warn every time a space in pattern is found\, no matter what.
I admit that peculiar cases such as
glob("$basename.c $basename.h")
will be mishandled - but these are surely few\, and we are only talking about a warning.
glob("$dir/foo.*"\, '')
undef would be better than '' here
If you want to declare that the special glob pattern undef is one that matches no files\, then yes. But I think that since there is already a glob pattern that never matches\, namely the empty string\, then we can use that and keep undef reserved for some future use. (For now\, passing undef to glob would give a warning.)
-- Ed Avis \eda@​waniasset\.com
On Tue\, Mar 11\, 2014 at 5:00 AM\, Ed Avis \eda@​waniasset\.com wrote:
H.Merijn Brand \<h.m.brand \
xs4all.nl> writes: writing
\<foo.* bar.* *.[ao] >
reads a whole lot easier than the alternatives
(glob ("foo.*")\, glob ("bar.*")\, glob ("*.[ao]))
Actually\, glob('*.c *.h') will split on spaces and glob as two patterns. I think that is a bug. It causes all sorts of code like
glob\("$dir/\*\.txt"\)
to be buggy and break whenever $dir contains spaces - which even the most wizened Unix greybeard must admit are nowadays common. But that's what it currently does.
That would still be buggy even if glob didn't split on spaces. Space isn't the only glob metacharacters.
It never made sense to me that an escaping function isn't provided\, especially since glob() requires different (incompatible) escaping rules on different systems.
Eric Brine \<ikegami \
  glob("$dir/*.txt")
That would still be buggy even if glob didn't split on spaces. Space isn't the only glob metacharacters.
True; on Unix any character except \0 can appear in a filename. If a single character for newline was the best design decision in Unix\, then the unrestricted filenames were the worst. However\, in practice you can draw a distinction between space\, which appears in commonly-used directories on hundreds of millions of systems worldwide\, and other exotic characters like * and {\, which are unlikely to be found in any ordinary environment.
So while not splitting on space would not cause glob("$dir/*") to become entirely bug-free\, it would certainly fix the most common problem with it.
-- Ed Avis \eda@​waniasset\.com
* Ed Avis \eda@​waniasset\.com [2014-03-17 09:10]:
True; on Unix any character except \0 can appear in a filename.
Close. You forgot the slash.
Aristotle Pagaltzis \<pagaltzis \
True; on Unix any character except \0 can appear in a filename.
Close. You forgot the slash.
Au contraire. A slash cannot appear in a directory entry\, but it can appear in a filename as commonly thought of; "/etc/passwd" is a filename.
I have a CD-ROM at home that has directory entries containing / characters. Linux essentially ignored those files. I don't know whether this is legal according to ISO 9660. The disc was intended for Acorn computers.
-- Ed Avis \eda@​waniasset\.com
On Mon\, Mar 17\, 2014 at 10:26 AM\, Ed Avis \eda@​waniasset\.com wrote:
Aristotle Pagaltzis \<pagaltzis \
gmx.de> writes: True; on Unix any character except \0 can appear in a filename.
Close. You forgot the slash.
Au contraire. A slash cannot appear in a directory entry\, but it can appear in a filename as commonly thought of; "/etc/passwd" is a filename.
A path yes\, a filename no. According to POSIX:
http://pubs.opengroup.org/onlinepubs/7908799/xbd/glossary.html#tag_004_000_114
which says\, under the glossary entry for "filename": "The characters composing the name may be selected from the set of all character values excluding the slash character and the null byte."
It should probably say all single-byte characters\, and I have doubts about the usefulness of filenames that have ASCII control characters embedded in them\, but it sounds like they are legal.
Right\, I should have said that / cannot appear in a filename\, also called a 'pathname component'. And used 'pathname' for the string such as "/etc/passwd" which you pass to open().
But nobody does that\, least of all perlfunc:
open FILEHANDLE Opens the file whose filename is given by EXPR...
-- Ed Avis \eda@​waniasset\.com
On Tue\, Mar 11\, 2014 at 02:17:16PM +0000\, Ed Avis wrote:
H.Merijn Brand \<h.m.brand \
xs4all.nl> writes: do you also want a distinction between
glob ($pattern) and glob ("$pattern")
No I don't think so. What I suggest is two rules:
- At run time\, if the pattern contains a space\, warn.
It's suggestions like this that makes me hesitant to suggest to people to use "use warnings" in their programs.
Because there's no garantee that a program that was fine and warnings free for a long time\, won't start spouting warnings.
Why not put a module on CPAN\, which exports a glob method that warns on anything you think people will error on? If it turns out to be hugely successful\, one has a much better argument of changing the current behaviour.
Abigail
"Craig A. Berry" \craig\.a\.berry@​gmail\.com writes:
which says\, under the glossary entry for "filename": "The characters composing the name may be selected from the set of all character values excluding the slash character and the null byte."
It should probably say all single-byte characters\,
Why? In the definition of "character" they explicitly mention "A sequence of one or more bytes ...".
(Although it would be interesting to use a filename that contains a valid multi-byte sequence that includes a byte corresponding to the ASCII interpretation of a slash.)
-- Johan
On Mon\, Mar 17\, 2014 at 11:26 AM\, Johan Vromans \jvromans@​squirrel\.nl wrote:
"Craig A. Berry" \craig\.a\.berry@​gmail\.com writes:
which says\, under the glossary entry for "filename": "The characters composing the name may be selected from the set of all character values excluding the slash character and the null byte."
It should probably say all single-byte characters\,
Why? In the definition of "character" they explicitly mention "A sequence of one or more bytes ...".
Sorry\, I guess I should've said single-byte encoding. Wide characters obviously don't meet the spec because of the null bytes.
On Mon\, Mar 17\, 2014 at 05:26:11PM +0100\, Johan Vromans wrote:
"Craig A. Berry" \craig\.a\.berry@​gmail\.com writes:
which says\, under the glossary entry for "filename": "The characters composing the name may be selected from the set of all character values excluding the slash character and the null byte."
It should probably say all single-byte characters\,
Why? In the definition of "character" they explicitly mention "A sequence of one or more bytes ...".
(Although it would be interesting to use a filename that contains a valid multi-byte sequence that includes a byte corresponding to the ASCII interpretation of a slash.)
Luckely\, that isn't possible with UTF-8.
\
I can't remember whether we ever got rid of said file. \</irrelevant story>
Abigail
* Ed Avis \eda@​waniasset\.com [2014-03-17 16:45]:
Right\, I should have said that / cannot appear in a filename\, also called a 'pathname component'. And used 'pathname' for the string such as "/etc/passwd" which you pass to open().
But nobody does that\, least of all perlfunc:
open FILEHANDLE Opens the file whose filename is given by EXPR\.\.\.
Well nobody usually has to. I canât think of any syscall that expects a filename specifically and will barf when given a path instead\, and terminological precision is then unnecessary. But when it comes to the legal characters in each\, there *is* a difference\, and so that precision is called for.
Regards\, -- Aristotle Pagaltzis // \<http://plasmasturm.org/>
On Mon\, Mar 17\, 2014 at 11:42 AM\, Abigail \abigail@​abigail\.be wrote:
On Mon\, Mar 17\, 2014 at 05:26:11PM +0100\, Johan Vromans wrote:
"Craig A. Berry" \craig\.a\.berry@​gmail\.com writes:
which says\, under the glossary entry for "filename": "The characters composing the name may be selected from the set of all character values excluding the slash character and the null byte."
It should probably say all single-byte characters\,
Why? In the definition of "character" they explicitly mention "A sequence of one or more bytes ...".
(Although it would be interesting to use a filename that contains a valid multi-byte sequence that includes a byte corresponding to the ASCII interpretation of a slash.)
Luckely\, that isn't possible with UTF-8.
Actually it is possible if you use a malformed UTF-8 sequence.
$ perl -E'say qq[/];' | perl -CS -E'$_=\<>;chomp;say unpack q[H*]\, $_' 2f $ perl -E'say qq[\057];' | perl -CS -E'$_=\<>;chomp;say unpack q[H*]\, $_' 2f $ perl -E'say qq[\300\257];' | perl -CS -E'$_=\<>;chomp;say unpack q[H*]\, $_' 2f
Migrated from rt.perl.org#121398 (status was 'rejected')
Searchable as RT121398$