Perl / perl5

🐪 The Perl programming language
https://dev.perl.org/perl5/
Other
1.93k stars 551 forks source link

​Abbreviation of `glob` and `readline` #13648

Closed p5pRT closed 10 years ago

p5pRT commented 10 years ago

Migrated from rt.perl.org#121398 (status was 'rejected')

Searchable as RT121398$

p5pRT commented 10 years ago

From the.rob.dixon@gmail.com

Created by the.rob.dixon@gmail.com

Cc​: strawberry-perl@​project Subject​: ​​ Abbreviation of `glob` and `readline` Message-Id​: \5\.16\.2\_16700\_1394233908@​Samurai Reply-To​: the.rob.dixon@​gmail.co ​​ m To​: perlbug@​perl.org From​: the.rob.dixon@​gmail.com

This is a bug report for perl from the.rob.dixon@​gmail.com\, generated with the help of perlbug 1.39 running under perl 5.16.2.

----------------------------------------------------------------- I believe that it is long-overdue for Perl's `\<>`\, `\`\, `\<*globtext*>` to be discouraged. There is nothing wrong with `readline` and `glob`\, and programs are better written that way.

There is clearly the issue of backward compatibility\, and of the sticklers that will want to pretend that Perl is really their favourite shell language. But I believe we should start to discourage the shorthand\, and that is why I have submitted this bug under the documentation category

Perl Info ``` Flags: category=docs severity=low Site configuration information for perl 5.16.2: Configured by strawberry-perl at Fri Nov 2 00:34:53 2012. Summary of my perl5 (revision 5 version 16 subversion 2) configuration: Platform: osname=MSWin32, osvers=4.0, archname=MSWin32-x86-multi-thread uname='Win32 strawberry-perl 5.16.2.1 #1 Fri Nov 2 00:33:54 2012 i386' config_args='undef' hint=recommended, useposix=true, d_sigaction=undef useithreads=define, usemultiplicity=define useperlio=define, d_sfio=undef, uselargefiles=define, usesocks=undef use64bitint=undef, use64bitall=undef, uselongdouble=undef usemymalloc=n, bincompat5005=undef Compiler: cc='gcc', ccflags =' -s -O2 -DWIN32 -DPERL_TEXTMODE_SCRIPTS -DPERL_IMPLICIT_CONTEXT -DPERL_IMPLICIT_SYS -fno-strict-aliasing -mms-bitfields', optimize='-s -O2', cppflags='-DWIN32' ccversion='', gccversion='4.6.3', gccosandvers='' intsize=4, longsize=4, ptrsize=4, doublesize=8, byteorder=1234 d_longlong=undef, longlongsize=8, d_longdbl=define, longdblsize=12 ivtype='long', ivsize=4, nvtype='double', nvsize=8, Off_t='long long', lseeksize=8 alignbytes=8, prototype=define Linker and Libraries: ld='g++', ldflags ='-s -L"C:\strawberry\perl\lib\CORE" -L"C:\strawberry\c\lib"' libpth=C:\strawberry\c\lib C:\strawberry\c\i686-w64-mingw32\lib libs=-lmoldname -lkernel32 -luser32 -lgdi32 -lwinspool -lcomdlg32 -ladvapi32 -lshell32 -lole32 -loleaut32 -lnetapi32 -luuid -lws2_32 -lmpr -lwinmm -lversion -lodbc32 -lodbccp32 -lcomctl32 perllibs=-lmoldname -lkernel32 -luser32 -lgdi32 -lwinspool -lcomdlg32 -ladvapi32 -lshell32 -lole32 -loleaut32 -lnetapi32 -luuid -lws2_32 -lmpr -lwinmm -lversion -lodbc32 -lodbccp32 -lcomctl32 libc=, so=dll, useshrplib=true, libperl=libperl516.a gnulibc_version='' Dynamic Linking: dlsrc=dl_win32.xs, dlext=dll, d_dlsymun=undef, ccdlflags=' ' cccdlflags=' ', lddlflags='-mdll -s -L"C:\strawberry\perl\lib\CORE" -L"C:\strawberry\c\lib"' Locally applied patches: @INC for perl 5.16.2: C:/strawberry/perl/site/lib/MSWin32-x86-multi-thread C:/strawberry/perl/site/lib C:/strawberry/perl/vendor/lib C:/strawberry/perl/lib . Environment for perl 5.16.2: HOME (unset) LANG (unset) LANGUAGE (unset) LD_LIBRARY_PATH (unset) LOGDIR (unset) PATH=C:\Program Files (x86)\ImageMagick-6.8.3-Q16;C:\Program Files\ImageMagick-6.8.3-Q16;C:\Program Files (x86)\NVIDIA Corporation\PhysX\Common;C:\Program Files (x86)\PC Connectivity Solution\;C:\Program Files\Common Files\Microsoft Shared\Windows Live;C:\Program Files (x86)\PHP\;C:\Windows\system32;C:\Windows;C:\Windows\System32\Wbem;C:\Windows\System32\WindowsPowerShell\v1.0\;C:\Program Files\jEdit;C:\usr\local\ppt\bin;C:\Program Files (x86)\GnuWin32\bin;C:\Program Files (x86)\Smart Projects\IsoBuster;C:\Program Files (x86)\Oracle\Berkeley DB 11gR2 5.3.15\bin;C:\Program Files (x86)\Git\cmd;C:\Program Files (x86)\Bazaar;C:\Program Files (x86)\Lua\5.1;C:\Program Files (x86)\Lua\5.1\clibs;C:\strawberry\c\bin;C:\strawberry\perl\site\bin;C:\strawberry\perl\bin;C:\Program Files\TortoiseSVN\bin;C:\MediaInfoCLI;C:\Program Files (x86)\MKVToolNix;C:\Program Files (x86)\Subversion\bin;C:\Program Files (x86)\Common Files\Ulead Systems\MPEG;C:\Program Files (x86)\QuickTime\QTSystem\;C:\Program Files\Microsoft Network Monitor 3\;C:\Program Files\Calibre2\;C:\Program Files (x86)\MySQL\MySQL Utilities 1.3.4\;C:\Program Files (x86)\Common Files\Acronis\SnapAPI\;C:\Program Files\WinRAR;C:\Program Files\Common Files\Microsoft Shared\Windows Live;C:\Program Files (x86)\Common Files\Hackety Hack\0.r1529\..;C:\Program Files (x86)\IDM Computer Solutions\UltraCompare\;C:\ffmpeg\bin;C:\Program Files (x86)\Serviio\lib;C:\Program Files (x86)\Microsoft Visual Studio 10.0\VC\bin\x86_amd64;C:\Program Files (x86)\Microsoft Visual Studio 10.0\VC\bin\amd64;C:\SWIG;C:\MinGW\bin;C:\MinGW\MSYS\1.0\bin;C:\Python27;C:\Curl;C:\pkg-config\bin;C:\glib\bin;C:\gettext-runtime\bin;C:\Program Files (x86)\IDM Computer Solutions\UltraEdit\;C:\LuaRocks\2.0;C:\Program Files (x86)\Android\android-sdk\platform-tools;C:\Ruby193\bin;C:\Ruby193.DevKit\bin;C:\Program Files (x86)\Nmap;C:\MobiPerl;C:\Program Files (x86)\EaseUS\Todo Backup\bin\x64\ PERL_BADLANG (unset) SHELL (unset) ```
p5pRT commented 10 years ago

From @tux

On Fri\, 7 Mar 2014 15​:21​:41 -0800\, Rob Dixon (via RT) \perlbug\-followup@&#8203;perl\.org wrote​:

I believe that it is long-overdue for Perl's `\<>`\, `\`\, `\<*globtext*>` to be discouraged. There is nothing wrong with `readline` and `glob`\, and programs are better written that way.

I agree with \ better be written as glob ("globtext")\, though writing

\<foo.* bar.* *.[ao] >

reads a whole lot easier than the alternatives

(glob ("foo.*")\, glob ("bar.*")\, glob ("*.[ao])) or map { glob ($_) } qw( foo.* bar.* *.[ao]

I partly agree with \\, but not by that it should be deprecated but by the opinion that I disapprove of global filehandle and \<$fh> should be promoted (for all but *STDIN\, *DATA and *ARGV)

I completely disagree with deprecation of \<> which is what makes perl perl

There is clearly the issue of backward compatibility\, and of the sticklers that will want to pretend that Perl is really their favourite shell language. But I believe we should start to discourage the shorthand\, and that is why I have submitted this bug under the documentation category

-- H.Merijn Brand http​://tux.nl Perl Monger http​://amsterdam.pm.org/ using perl5.00307 .. 5.19 porting perl5 on HP-UX\, AIX\, and openSUSE http​://mirrors.develooper.com/hpux/ http​://www.test-smoke.org/ http​://qa.perl.org http​://www.goldmark.org/jeff/stupid-disclaimers/

p5pRT commented 10 years ago

The RT System itself - Status changed from 'new' to 'open'

p5pRT commented 10 years ago

From @ikegami

On Mon\, Mar 10\, 2014 at 12​:12 PM\, H.Merijn Brand \h\.m\.brand@&#8203;xs4all\.nlwrote​:

On Fri\, 7 Mar 2014 15​:21​:41 -0800\, Rob Dixon (via RT) \perlbug\-followup@&#8203;perl\.org wrote​:

I believe that it is long-overdue for Perl's `\<>`\, `\`\, `\<*globtext*>` to be discouraged. There is nothing wrong with `readline` and `glob`\, and programs are better written that way.

I agree with \ better be written as glob ("globtext")\, though writing

\<foo.* bar.* *.[ao] >

reads a whole lot easier than the alternatives

(glob ("foo.*")\, glob ("bar.*")\, glob ("*.[ao])) or map { glob ($_) } qw( foo.* bar.* *.[ao]

glob q\<foo.* bar.* *.[ao]>

p5pRT commented 10 years ago

From @jkeenan

On Fri Mar 07 15​:21​:41 2014\, the.rob.dixon@​gmail.com wrote​: [snip]

I believe that it is long-overdue for Perl's `\<>`\, `\`\, `\<*globtext*>` to be discouraged. There is nothing wrong with `readline` and `glob`\, and programs are better written that way.

-1

p5pRT commented 10 years ago

From @rjbs

I sympathize\, but not enough to agree.

-- rjbs

p5pRT commented 10 years ago

@rjbs - Status changed from 'open' to 'rejected'

p5pRT commented 10 years ago

From @epa

H.Merijn Brand \<h.m.brand \ xs4all.nl> writes​:

writing

\<foo.* bar.* *.[ao] >

reads a whole lot easier than the alternatives

(glob ("foo.*")\, glob ("bar.*")\, glob ("*.[ao]))

Actually\, glob('*.c *.h') will split on spaces and glob as two patterns. I think that is a bug. It causes all sorts of code like

  glob("$dir/*.txt")

to be buggy and break whenever $dir contains spaces - which even the most wizened Unix greybeard must admit are nowadays common. But that's what it currently does.

I would certainly support that glob(pattern) change to not split at spaces\, perhaps keeping the older \<*.c *.h> syntax splitting for compatibility.

-- Ed Avis \eda@&#8203;waniasset\.com

p5pRT commented 10 years ago

From @tux

On Tue\, 11 Mar 2014 09​:00​:31 +0000 (UTC)\, Ed Avis \eda@&#8203;waniasset\.com wrote​:

H.Merijn Brand \<h.m.brand \ xs4all.nl> writes​:

writing

\<foo.* bar.* *.[ao] >

reads a whole lot easier than the alternatives

(glob ("foo.*")\, glob ("bar.*")\, glob ("*.[ao]))

Actually\, glob('*.c *.h') will split on spaces and glob as two patterns. I think that is a bug. It causes all sorts of code like

glob\("$dir/\*\.txt"\)

to be buggy and break whenever $dir contains spaces - which even the most wizened Unix greybeard must admit are nowadays common. But that's what it currently does.

I would certainly support that glob(pattern) change to not split at spaces\, perhaps keeping the older \<*.c *.h> syntax splitting for compatibility.

+1

-- H.Merijn Brand http​://tux.nl Perl Monger http​://amsterdam.pm.org/ using perl5.00307 .. 5.19 porting perl5 on HP-UX\, AIX\, and openSUSE http​://mirrors.develooper.com/hpux/ http​://www.test-smoke.org/ http​://qa.perl.org http​://www.goldmark.org/jeff/stupid-disclaimers/

p5pRT commented 10 years ago

From @Smylers

Ed Avis writes​:

glob('*.c *.h') will split on spaces and glob as two patterns. I think that is a bug. It causes all sorts of code like

glob\("$dir/\*\.txt"\)

to be buggy and break whenever $dir contains spaces

I agree it's turned out to be a poor interface\, having been caught out by exactly this feature recently.

I would certainly support that glob(pattern) change to not split at spaces

That would break programs relying on the current behaviour. That seems unnecessarily harsh\, swapping an irritation for current developers for users of programs written years ago suddenly suffering breakage.

perldoc -f glob already has a note pointing this behaviour out\, and points out that File​::Glob provides bsd_glob without this misfeature.

Smylers -- http​://twitter.com/Smylers2

p5pRT commented 10 years ago

From @epa

Perhaps we can agree that a warning should be issued when the pattern string given to glob() contains a space?

The 'weak' warning would be to warn on glob("$dir/foo.*") if\, at runtime\, the pattern string contains a space. But not to warn for glob('*.c *.h') where the pattern string is known at compile time to contain a space - showing the programmer's deliberate intention to match two patterns.

The alternative is a 'strong' warning which fires whenever the pattern contains spaces.

The intention of the 'weak' warning is that the perl developers intend to keep the current semantics of glob()\, but flag cases where it may be used wrongly.

The intention of the 'strong' warning is to flag buggy code\, but also to prepare for a future version of perl where the semantics of glob() change.

Which would folk here prefer?

I also think that glob() should change to accept multiple arguments\, so you can say glob('*.c'\, '*.h'). This would not break any compatibility since they are currently disallowed. Most likely\, the multi-arg form would never split at spaces. That means you could safely say

  glob("$dir/foo.*"\, '')

Ugly\, but workable until such future date (if any) when single-arg glob() changes to not split at spaces.

-- Ed Avis \eda@&#8203;waniasset\.com

p5pRT commented 10 years ago

From @epa

Smylers \<Smylers \ stripey.com> writes​:

I would certainly support that glob(pattern) change to not split at spaces

That would break programs relying on the current behaviour. That seems unnecessarily harsh\, swapping an irritation for current developers for users of programs written years ago suddenly suffering breakage.

On the other hand\, there are plenty of programs written years ago with unsafe constructs like glob("$dir/*.txt"). These are buggy now\, but would start to work properly if the semantics of glob() were changed. So weighed against the programs suddenly suffering breakage are many suffering 'mendage'.

Perhaps as a halfway house the glob() semantics could still split if the pattern is a fixed string at compile time\, but not if it is pasted together at run time. So glob('*.c *.h') would still split but glob($x) would not. Unpleasant but perhaps the best way forward.

-- Ed Avis \eda@&#8203;waniasset\.com

p5pRT commented 10 years ago

From @tux

On Tue\, 11 Mar 2014 11​:20​:02 +0000 (UTC)\, Ed Avis \eda@&#8203;waniasset\.com wrote​:

Perhaps we can agree that a warning should be issued when the pattern string given to glob() contains a space?

The 'weak' warning would be to warn on glob("$dir/foo.*") if\, at runtime\, the pattern string contains a space. But not to warn for glob('*.c *.h') where the pattern string is known at compile time to contain a space - showing the programmer's deliberate intention to match two patterns.

do you also want a distinction between

glob ($pattern) and glob ("$pattern") ?

The alternative is a 'strong' warning which fires whenever the pattern contains spaces.

The intention of the 'weak' warning is that the perl developers intend to keep the current semantics of glob()\, but flag cases where it may be used wrongly.

+1

The intention of the 'strong' warning is to flag buggy code\, but also to prepare for a future version of perl where the semantics of glob() change.

Which would folk here prefer?

That people would never use file/folder names with spaces :) (or database table/field names with spaces\, slashes\, colons\, semicolons\, minus signs\, dollars or whatever some imbecile database allows them to use)

I also think that glob() should change to accept multiple arguments\, so you can say glob('*.c'\, '*.h').

+10!

This would not break any compatibility since they are currently disallowed. Most likely\, the multi-arg form would never split at spaces. That means you could safely say

glob\("$dir/foo\.\*"\, ''\)

undef would be better than '' here

Ugly\, but workable until such future date (if any) when single-arg glob() changes to not split at spaces.

-- H.Merijn Brand http​://tux.nl Perl Monger http​://amsterdam.pm.org/ using perl5.00307 .. 5.19 porting perl5 on HP-UX\, AIX\, and openSUSE http​://mirrors.develooper.com/hpux/ http​://www.test-smoke.org/ http​://qa.perl.org http​://www.goldmark.org/jeff/stupid-disclaimers/

p5pRT commented 10 years ago

From victor@vsespb.ru

2014-03-11 17​:23 GMT+04​:00 H.Merijn Brand \h\.m\.brand@&#8203;xs4all\.nl​:

That people would never use file/folder names with spaces :) (or database table/field names with spaces\, slashes\, colons\, semicolons\, minus signs\, dollars or whatever some imbecile database allows them to use)

I think need to distinct programmers and application users here. Applications might need to work with user supplied paths\, and users always will have pathnames with spaces. Example with database is a bit different thing here.

p5pRT commented 10 years ago

From @tux

On Tue\, 11 Mar 2014 17​:28​:51 +0400\, Victor Efimov \victor@&#8203;vsespb\.ru wrote​:

2014-03-11 17​:23 GMT+04​:00 H.Merijn Brand \h\.m\.brand@&#8203;xs4all\.nl​:

That people would never use file/folder names with spaces :) (or database table/field names with spaces\, slashes\, colons\, semicolons\, minus signs\, dollars or whatever some imbecile database allows them to use)

I think need to distinct programmers and application users here. Applications might need to work with user supplied paths\, and users always will have pathnames with spaces. Example with database is a bit different thing here.

you stripped the context​:

Which would folk here prefer?

seen the smiley? Yes I know that people will make our lives miserable by having default setups doing stuff we don't like. That includes allowing spaces in folder and document names or having case-aware file systems.

But all of that is not what the focus of glob () is.

-- H.Merijn Brand http​://tux.nl Perl Monger http​://amsterdam.pm.org/ using perl5.00307 .. 5.19 porting perl5 on HP-UX\, AIX\, and openSUSE http​://mirrors.develooper.com/hpux/ http​://www.test-smoke.org/ http​://qa.perl.org http​://www.goldmark.org/jeff/stupid-disclaimers/

p5pRT commented 10 years ago

From @epa

Filenames with spaces are a fact of life on the majority of systems. In 1995 Windows started using "C​:\Program Files"\, and ActivePerl at least started installing Perl under that location. It is shocking that nearly two decades later there are still straightforward bugs in handling such filenames.

That doesn't detract from the need for backwards compatibility and not to break code - but please let's not go off on a side track of "get users to stop putting spaces in filenames". Spaces in filenames exist. Perl has to deal with them. Easy things should be easy.

-- Ed Avis \eda@&#8203;waniasset\.com

p5pRT commented 10 years ago

From @epa

H.Merijn Brand \<h.m.brand \ xs4all.nl> writes​:

do you also want a distinction between

glob ($pattern) and glob ("$pattern")

No I don't think so. What I suggest is two rules​:

- At run time\, if the pattern contains a space\, warn.

- *But* if at compile time the pattern is a fixed string (not a double-   quoted string with variables) and that fixed string contains spaces\,   then suppress the runtime warning.

That gives the 'weak' setup. The 'strong' setup is to warn every time a space in pattern is found\, no matter what.

I admit that peculiar cases such as

  glob("$basename.c $basename.h")

will be mishandled - but these are surely few\, and we are only talking about a warning.

glob("$dir/foo.*"\, '')

undef would be better than '' here

If you want to declare that the special glob pattern undef is one that matches no files\, then yes. But I think that since there is already a glob pattern that never matches\, namely the empty string\, then we can use that and keep undef reserved for some future use. (For now\, passing undef to glob would give a warning.)

-- Ed Avis \eda@&#8203;waniasset\.com

p5pRT commented 10 years ago

From @ikegami

On Tue\, Mar 11\, 2014 at 5​:00 AM\, Ed Avis \eda@&#8203;waniasset\.com wrote​:

H.Merijn Brand \<h.m.brand \ xs4all.nl> writes​:

writing

\<foo.* bar.* *.[ao] >

reads a whole lot easier than the alternatives

(glob ("foo.*")\, glob ("bar.*")\, glob ("*.[ao]))

Actually\, glob('*.c *.h') will split on spaces and glob as two patterns. I think that is a bug. It causes all sorts of code like

glob\("$dir/\*\.txt"\)

to be buggy and break whenever $dir contains spaces - which even the most wizened Unix greybeard must admit are nowadays common. But that's what it currently does.

That would still be buggy even if glob didn't split on spaces. Space isn't the only glob metacharacters.

It never made sense to me that an escaping function isn't provided\, especially since glob() requires different (incompatible) escaping rules on different systems.

p5pRT commented 10 years ago

From @epa

Eric Brine \<ikegami \ adaelis.com> writes​:

    glob("$dir/*.txt")

That would still be buggy even if glob didn't split on spaces. Space isn't the only glob metacharacters.

True; on Unix any character except \0 can appear in a filename. If a single character for newline was the best design decision in Unix\, then the unrestricted filenames were the worst. However\, in practice you can draw a distinction between space\, which appears in commonly-used directories on hundreds of millions of systems worldwide\, and other exotic characters like * and {\, which are unlikely to be found in any ordinary environment.

So while not splitting on space would not cause glob("$dir/*") to become entirely bug-free\, it would certainly fix the most common problem with it.

-- Ed Avis \eda@&#8203;waniasset\.com

p5pRT commented 10 years ago

From @ap

* Ed Avis \eda@&#8203;waniasset\.com [2014-03-17 09​:10]​:

True; on Unix any character except \0 can appear in a filename.

Close. You forgot the slash.

p5pRT commented 10 years ago

From @epa

Aristotle Pagaltzis \<pagaltzis \ gmx.de> writes​:

True; on Unix any character except \0 can appear in a filename.

Close. You forgot the slash.

Au contraire. A slash cannot appear in a directory entry\, but it can appear in a filename as commonly thought of; "/etc/passwd" is a filename.

I have a CD-ROM at home that has directory entries containing / characters. Linux essentially ignored those files. I don't know whether this is legal according to ISO 9660. The disc was intended for Acorn computers.

-- Ed Avis \eda@&#8203;waniasset\.com

p5pRT commented 10 years ago

From @craigberry

On Mon\, Mar 17\, 2014 at 10​:26 AM\, Ed Avis \eda@&#8203;waniasset\.com wrote​:

Aristotle Pagaltzis \<pagaltzis \ gmx.de> writes​:

True; on Unix any character except \0 can appear in a filename.

Close. You forgot the slash.

Au contraire. A slash cannot appear in a directory entry\, but it can appear in a filename as commonly thought of; "/etc/passwd" is a filename.

A path yes\, a filename no. According to POSIX​:

http​://pubs.opengroup.org/onlinepubs/7908799/xbd/glossary.html#tag_004_000_114

which says\, under the glossary entry for "filename"​: "The characters composing the name may be selected from the set of all character values excluding the slash character and the null byte."

It should probably say all single-byte characters\, and I have doubts about the usefulness of filenames that have ASCII control characters embedded in them\, but it sounds like they are legal.

p5pRT commented 10 years ago

From @epa

Right\, I should have said that / cannot appear in a filename\, also called a 'pathname component'. And used 'pathname' for the string such as "/etc/passwd" which you pass to open().

But nobody does that\, least of all perlfunc​:

  open FILEHANDLE   Opens the file whose filename is given by EXPR...

-- Ed Avis \eda@&#8203;waniasset\.com

p5pRT commented 10 years ago

From @abigail

On Tue\, Mar 11\, 2014 at 02​:17​:16PM +0000\, Ed Avis wrote​:

H.Merijn Brand \<h.m.brand \ xs4all.nl> writes​:

do you also want a distinction between

glob ($pattern) and glob ("$pattern")

No I don't think so. What I suggest is two rules​:

- At run time\, if the pattern contains a space\, warn.

It's suggestions like this that makes me hesitant to suggest to people to use "use warnings" in their programs.

Because there's no garantee that a program that was fine and warnings free for a long time\, won't start spouting warnings.

Why not put a module on CPAN\, which exports a glob method that warns on anything you think people will error on? If it turns out to be hugely successful\, one has a much better argument of changing the current behaviour.

Abigail

p5pRT commented 10 years ago

From @sciurius

"Craig A. Berry" \craig\.a\.berry@&#8203;gmail\.com writes​:

which says\, under the glossary entry for "filename"​: "The characters composing the name may be selected from the set of all character values excluding the slash character and the null byte."

It should probably say all single-byte characters\,

Why? In the definition of "character" they explicitly mention "A sequence of one or more bytes ...".

(Although it would be interesting to use a filename that contains a valid multi-byte sequence that includes a byte corresponding to the ASCII interpretation of a slash.)

-- Johan

p5pRT commented 10 years ago

From @craigberry

On Mon\, Mar 17\, 2014 at 11​:26 AM\, Johan Vromans \jvromans@&#8203;squirrel\.nl wrote​:

"Craig A. Berry" \craig\.a\.berry@&#8203;gmail\.com writes​:

which says\, under the glossary entry for "filename"​: "The characters composing the name may be selected from the set of all character values excluding the slash character and the null byte."

It should probably say all single-byte characters\,

Why? In the definition of "character" they explicitly mention "A sequence of one or more bytes ...".

Sorry\, I guess I should've said single-byte encoding. Wide characters obviously don't meet the spec because of the null bytes.

p5pRT commented 10 years ago

From @abigail

On Mon\, Mar 17\, 2014 at 05​:26​:11PM +0100\, Johan Vromans wrote​:

"Craig A. Berry" \craig\.a\.berry@&#8203;gmail\.com writes​:

which says\, under the glossary entry for "filename"​: "The characters composing the name may be selected from the set of all character values excluding the slash character and the null byte."

It should probably say all single-byte characters\,

Why? In the definition of "character" they explicitly mention "A sequence of one or more bytes ...".

(Although it would be interesting to use a filename that contains a valid multi-byte sequence that includes a byte corresponding to the ASCII interpretation of a slash.)

Luckely\, that isn't possible with UTF-8.

\ A long time ago\, I was in charge of a troublesome SGI machine. Once or twice a month\, it would crash hard enough that it needed a file system repair before it would properly boot again. At one moment in time\, the file system repair left us with a file named '/'; which wouldn't go away on any subsequent file system repair.

I can't remember whether we ever got rid of said file. \</irrelevant story>

Abigail

p5pRT commented 10 years ago

From @ap

* Ed Avis \eda@&#8203;waniasset\.com [2014-03-17 16​:45]​:

Right\, I should have said that / cannot appear in a filename\, also called a 'pathname component'. And used 'pathname' for the string such as "/etc/passwd" which you pass to open().

But nobody does that\, least of all perlfunc​:

   open FILEHANDLE
       Opens the file whose filename is given by EXPR\.\.\.

Well nobody usually has to. I can’t think of any syscall that expects a filename specifically and will barf when given a path instead\, and terminological precision is then unnecessary. But when it comes to the legal characters in each\, there *is* a difference\, and so that precision is called for.

Regards\, -- Aristotle Pagaltzis // \<http​://plasmasturm.org/>

p5pRT commented 10 years ago

From @b2gills

On Mon\, Mar 17\, 2014 at 11​:42 AM\, Abigail \abigail@&#8203;abigail\.be wrote​:

On Mon\, Mar 17\, 2014 at 05​:26​:11PM +0100\, Johan Vromans wrote​:

"Craig A. Berry" \craig\.a\.berry@&#8203;gmail\.com writes​:

which says\, under the glossary entry for "filename"​: "The characters composing the name may be selected from the set of all character values excluding the slash character and the null byte."

It should probably say all single-byte characters\,

Why? In the definition of "character" they explicitly mention "A sequence of one or more bytes ...".

(Although it would be interesting to use a filename that contains a valid multi-byte sequence that includes a byte corresponding to the ASCII interpretation of a slash.)

Luckely\, that isn't possible with UTF-8.

Actually it is possible if you use a malformed UTF-8 sequence.

  $ perl -E'say qq[/];' | perl -CS -E'$_=\<>;chomp;say unpack q[H*]\, $_'   2f   $ perl -E'say qq[\057];' | perl -CS -E'$_=\<>;chomp;say unpack q[H*]\, $_'   2f   $ perl -E'say qq[\300\257];' | perl -CS -E'$_=\<>;chomp;say unpack q[H*]\, $_'   2f