Perl / perl5

🐪 The Perl programming language
https://dev.perl.org/perl5/
Other
1.85k stars 527 forks source link

File::Find (follow=>1) confused by symlink with trailing slash: _ handle undefined #22089

Open jimav opened 1 month ago

jimav commented 1 month ago

Module: File::Find 1.43 (in Perl v5.37.10)

Description With follow => 1 if a symlink to a directory contains a trailing slash, and that directory contains a relative symlink to a file, then file descriptor '_' is not defined when visiting the linked-to file.

The script below creates the following structure:

/tmp/tdir
├── A
│   └── File.txt
├── B
│   └── File_link -> ../A/File.txt
├── testlink1 -> /tmp/tdir/B
└── testlink2 -> /tmp/tdir/B/

When File::Find::find is called with a starting path of /tmp/tdir/testlink2, then "wanted" is called with $_ set to File_Link (and File::Find::name is /tmp/tdir/testlink2/File_link) but with filehandle '_' undefined.

Everything works as expected when starting at /tmp/tdir/testlink1.

Steps to Reproduce

#!/usr/bin/env perl
use strict; use warnings; use feature qw/say/;
STDOUT->autoflush; STDERR->autoflush;
use Carp;
use File::Find;
use Path::Tiny qw/path/;
say "\$^V = $^V";
say "\$File::Find::VERSION = $File::Find::VERSION";

sub do_find($) {
  my ($find_target) = @_;
  say "-------- $find_target ---------------------------------";
  system "set -x; (ls -l '$find_target') 2>&1";
  File::Find::find(
    {
      follow => 1,       # follow symbolic links
      follow_skip => 2,  # ignore duplicates (i.e. in case of cycles)
      wanted => sub {
        say "Visiting '$_' ($File::Find::name)";
        my @stat = lstat(_); # '_' is "guaranteed" to be valid 
        die "lstat _ failed for '$_' ($!)" unless @stat;
        if    (-l _) { say "   symlink -> ",readlink($_); }
        elsif (-f _) { say "   file size = $stat[7]"; }
        elsif (-d _) { }
        else         { say "   unknown object" }
      }
    },
    $find_target
  );
}

my $tdir = path("/tmp/tdir"); $tdir->remove_tree; $tdir->mkdir;

my $Adir = $tdir->child("A")->mkdir;
$Adir->child("File.txt")->spew("Hi Mom!\n");

my $Bdir = $tdir->child("B")->mkdir;
system("ln -s '../A/File.txt' '$Bdir/File_link'")==0 or die;

system("ln -s '$Bdir'  '$tdir/testlink1'")==0 or die;
system("ln -s '$Bdir/' '$tdir/testlink2'")==0 or die;

system "set -x; (tree --noreport $tdir) 2>&1";

do_find(path("$tdir/testlink1")->canonpath);
do_find(path("$tdir/testlink2")->canonpath);                             

Expected behavior I think the trailing slash should mean nothing (on Linux).

Actual Results

-------- /tmp/tdir/testlink1 ---------------------------------
+ ls -l /tmp/tdir/testlink1
lrwxrwxrwx 1 brew brew 11 Mar 20 19:34 /tmp/tdir/testlink1 -> /tmp/tdir/B
Visiting '.' (/tmp/tdir/testlink1)
Visiting 'File_link' (/tmp/tdir/testlink1/File_link)
   symlink -> ../A/File.txt
-------- /tmp/tdir/testlink2 ---------------------------------
+ ls -l /tmp/tdir/testlink2
lrwxrwxrwx 1 brew brew 12 Mar 20 19:34 /tmp/tdir/testlink2 -> /tmp/tdir/B/
Visiting '.' (/tmp/tdir/testlink2)
Visiting 'File_link' (/tmp/tdir/testlink2/File_link)
stat _ failed for 'File_link' (Bad file descriptor) at /tmp/t1 line 23.

Perl configuration perl_dashV.txt

mauke commented 1 month ago

The following patch seems to fix the problem, but the code kind of scares me (and it really needs tests):

diff --git ext/File-Find/lib/File/Find.pm ext/File-Find/lib/File/Find.pm
index af84fbf116..fb4facf356 100644
--- ext/File-Find/lib/File/Find.pm
+++ ext/File-Find/lib/File/Find.pm
@@ -32,12 +32,12 @@ sub contract_name {

     $cdir = substr($cdir,0,rindex($cdir,'/')+1);

-    $fn =~ s|^\./||;
+    $fn =~ s|^(?:\./+)+||;

     my $abs_name= $cdir . $fn;

     if (substr($fn,0,3) eq '../') {
-       1 while $abs_name =~ s!/[^/]*/\.\./+!/!;
+       1 while $abs_name =~ s!/[^/]+/+\.\./+!/!;
     }

     return $abs_name;