Perl / perl5

🐪 The Perl programming language
https://dev.perl.org/perl5/
Other
1.85k stars 524 forks source link

perl -n mode eats trailing spaces in filenames #22112

Open hlein opened 1 month ago

hlein commented 1 month ago

Description If perl -n processes an argument of a filename containing a trailing space, the space will be eaten before the file is opened. Ironically the openat() w/mangled name which gets ENOENT is fillowed by a newfstatat() with the correct name which returns 0 (success).

Steps to Reproduce

tmp $ echo 'space middle' >'space middle' ; echo 'spaceend ' >'spaceend ' ; \
  find . -maxdepth 1 -name space\* -print0 | xargs -0 perl -ne 'print'
space middle
Can't open ./spaceend : No such file or directory at -e line 1, <> line 1.

Under strace, we can observe:

[pid 16528] openat(AT_FDCWD, "./space middle", O_RDONLY|O_CLOEXEC) = 3
### middle space preserved fine
[pid 16528] openat(AT_FDCWD, "./spaceend", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
[pid 16528] newfstatat(AT_FDCWD, "./spaceend ", {st_mode=S_IFREG|0664, st_size=4, ...}, 0) = 0
### open with the space stripped, followed by stat w/correct name

Attempting to doctor @ARGV in BEGIN by, say, \-escaping trailing spaces does not work, we get open() with a literal \ but no trailing space, and then a newfstatat() with both the \ and the space:

tmp $ echo 'space middle' >'space middle' ; echo 'spaceend ' >'spaceend ' ; \
  find . -maxdepth 1 -name space\* -print0 | \
  xargs -0 strace -f perl -ne 'BEGIN { @ARGV = ( map { s/( )$/\\ /g; $_ } @ARGV ) } print'
...
openat(AT_FDCWD, "./spaceend\\", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
newfstatat(AT_FDCWD, "./spaceend\\ ", 0x7ffcf93b1010, 0) = -1 ENOENT (No such file or directory)

Expected behavior

Filenames provided as arguments should be preserved; test-case output should be:

space middle
spaceend 

Perl configuration

Site configuration information for perl 5.38.2:

Configured by Gentoo at Sun Mar 24 15:08:42 MDT 2024.

Summary of my perl5 (revision 5 version 38 subversion 2) configuration:

  Platform:
    osname=linux
    osvers=6.6.9-gentoo
    archname=x86_64-linux
    uname='linux localhost 6.6.x'
    [snip]

---
@INC for perl 5.38.2:
    /etc/perl
    /usr/local/lib64/perl5/5.38/x86_64-linux
    /usr/local/lib64/perl5/5.38
    /usr/lib64/perl5/vendor_perl/5.38/x86_64-linux
    /usr/lib64/perl5/vendor_perl/5.38
    [snip]

---
Environment for perl 5.38.2:
    [snip]
    LANG=en_US.utf8
    SHELL=/bin/bash
iabyn commented 1 month ago

On Sat, Mar 30, 2024 at 06:22:11PM -0700, hlein wrote:

If perl -n processes an argument of a filename containing a trailing space, the space will be eaten before the file is opened.

This is documented behaviour.

-n and -p are documented (in perlrun) to do

while (<>) { ... }

'while (<>)' is documented (in perlop, "I/O Operators") to do

    while ($ARGV = shift) {
        open(ARGV, $ARGV);
        while (<ARGV>) {
            ...     # code for each line
        }
    }

and 2-arg open is documented (in perlfunc, "Whitespace and special characters in the filename argument") to strip leading and trailing whitespace.

It's not ideal behaviour, but its been documented that way for 30+ years.

I wonder whether we should add a command-line switch to make <> act like <<>> ?

-- I don't want to achieve immortality through my work... I want to achieve it through not dying. -- Woody Allen

hlein commented 1 month ago

On Sat, Mar 30, 2024 at 06:22:11PM -0700, hlein wrote:

If perl -n processes an argument of a filename containing a trailing space, the space will be eaten before the file is opened.

This is documented behaviour.

       while ($ARGV = shift) {
           open(ARGV, $ARGV);

and 2-arg open is documented (in perlfunc, "Whitespace and special> characters in the filename argument") to strip leading and trailing whitespace.

Aha! Yes, you are right. I've purged 2-argument open() from my own muscle memory years ago, was not thinking about that being the method -n uses and its implications for implied whitespace strip.

I wonder whether we should add a command-line switch to make <> act like <<>> ?

That or any kind of pragma that could alter the type of open performed by -n that could be called in BEGIN? (I'd be afraid of unintended consequences elsewhere in scripts, unless you meant only for -n's processing.) Maybe it's possible to hook the 2-arg open performed by -n and iff file not found and file ends in space and such a file exists (the subsequent newfstatat finds it, after all), have it retry a 3-argument open? Well, that's ugly.

Hm, it's slightly worse though. In a directory with foo and foo, doing find ... -print0 | xargs -0 perl -ne ... will end up silently processing foo twice and foo not at all?

Filenames that end in spaces is silly. Only reason I encountered this was writing some tools to iterate through arbitrary code trees / repositories and do some calculations... but some projects have test-case files that end in spaces on purpose, which tripped me up.

guest20 commented 1 month ago

it might be nice to have -N and -P that do 3 arg opens without the trimming...