Perl / perl5

🐪 The Perl programming language
https://dev.perl.org/perl5/
Other
1.96k stars 555 forks source link

Peculiar debugger 'list' command behavior: "floating point" line numbers #21350

Closed joemcmahon closed 1 year ago

joemcmahon commented 1 year ago

Module: perl5db.pl

Description

Accidentally entering a line range with one or more periods on a list command causes the debugger to produce anomalous line numbers

Steps to Reproduce

  1. perl -d any script.
  2. At the debugger prompt, enter l 1.10.
  3. The debugger will list line 1 with the line number 1.10.
  4. Enter a bare l command.
  5. The debugger will list lines 2 to 11 as 2.10, 3.10,...11.10.
  6. if you enter l 1.2.3, the debugger will list line 1 as line 1.2.3.
  7. A subsequent list command will list lines as 2.2, 2.3,...
  8. Entering l 127.0.0.1 will list line 127 as 127.0.0.1 ....
  9. Subsequent l commands will not show decimal points, so the logic is detecting that floating point numbers with no decimals can be displayed as integers

Expected behavior

The debugger should either reject numbers containing periods on a list command; automatically truncate them to integers (probably the best option); or interpret the first . as a comma, then truncate anything after a second ., and print the resulting range of lines.

Perl configuration

# perl -V output goes here
Summary of my perl5 (revision 5 version 38 subversion 0) configuration:

  Platform:
    osname=darwin
    osvers=22.6.0
    archname=darwin-2level
    uname='darwin tekili-li.local 22.6.0 darwin kernel version 22.6.0: wed jul 5 22:22:05 pdt 2023; root:xnu-8796.141.3~6release_arm64_t6000 arm64 '
    config_args='-de -Dprefix=/Users/joemcmahon/perl5/perlbrew/perls/perl-5.38.0 -Aeval:scriptdir=/Users/joemcmahon/perl5/perlbrew/perls/perl-5.38.0/bin'
    hint=recommended
    useposix=true
    d_sigaction=define
    useithreads=undef
    usemultiplicity=undef
    use64bitint=define
    use64bitall=define
    uselongdouble=undef
    usemymalloc=n
    default_inc_excludes_dot=define
  Compiler:
    cc='cc'
    ccflags ='-fno-common -DPERL_DARWIN -mmacosx-version-min=13.5 -DNO_POSIX_2008_LOCALE -fno-strict-aliasing -pipe -fstack-protector-strong'
    optimize='-O3'
    cppflags='-fno-common -DPERL_DARWIN -mmacosx-version-min=13.5 -DNO_POSIX_2008_LOCALE -fno-strict-aliasing -pipe -fstack-protector-strong'
    ccversion=''
    gccversion='Apple LLVM 14.0.3 (clang-1403.0.22.14.1)'
    gccosandvers=''
    intsize=4
    longsize=8
    ptrsize=8
    doublesize=8
    byteorder=12345678
    doublekind=3
    d_longlong=define
    longlongsize=8
    d_longdbl=define
    longdblsize=8
    longdblkind=0
    ivtype='long'
    ivsize=8
    nvtype='double'
    nvsize=8
    Off_t='off_t'
    lseeksize=8
    alignbytes=8
    prototype=define
  Linker and Libraries:
    ld='cc'
    ldflags =' -mmacosx-version-min=13.5 -fstack-protector-strong'
    libpth=/Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/lib/clang/14.0.3/lib /Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX.sdk/usr/lib /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/lib /usr/lib
    libs= 
    perllibs=
    libc=
    so=dylib
    useshrplib=false
    libperl=libperl.a
    gnulibc_version=''
  Dynamic Linking:
    dlsrc=dl_dlopen.xs
    dlext=bundle
    d_dlsymun=undef
    ccdlflags=' '
    cccdlflags=' '
    lddlflags=' -mmacosx-version-min=13.5 -bundle -undefined dynamic_lookup -fstack-protector-strong'

Characteristics of this binary (from libperl): 
  Compile-time options:
    HAS_LONG_DOUBLE
    HAS_STRTOLD
    HAS_TIMES
    PERLIO_LAYERS
    PERL_COPY_ON_WRITE
    PERL_DONT_CREATE_GVSV
    PERL_HASH_FUNC_SIPHASH13
    PERL_HASH_USE_SBOX32
    PERL_MALLOC_WRAP
    PERL_OP_PARENT
    PERL_PRESERVE_IVUV
    PERL_USE_SAFE_PUTENV
    USE_64_BIT_ALL
    USE_64_BIT_INT
    USE_LARGE_FILES
    USE_LOCALE
    USE_LOCALE_COLLATE
    USE_LOCALE_CTYPE
    USE_LOCALE_NUMERIC
    USE_LOCALE_TIME
    USE_PERLIO
    USE_PERL_ATOF
  Built under darwin
  Compiled at Aug  1 2023 20:18:24
  %ENV:
    PERLBREW_HOME="/Users/joemcmahon/.perlbrew"
    PERLBREW_MANPATH="/Users/joemcmahon/perl5/perlbrew/perls/perl-5.38.0/man"
    PERLBREW_PATH="/Users/joemcmahon/perl5/perlbrew/bin:/Users/joemcmahon/perl5/perlbrew/perls/perl-5.38.0/bin"
    PERLBREW_PERL="perl-5.38.0"
    PERLBREW_ROOT="/Users/joemcmahon/perl5/perlbrew"
    PERLBREW_SHELLRC_VERSION="0.97"
    PERLBREW_VERSION="0.97"
  @INC:
    /Users/joemcmahon/perl5/perlbrew/perls/perl-5.38.0/lib/site_perl/5.38.0/darwin-2level
    /Users/joemcmahon/perl5/perlbrew/perls/perl-5.38.0/lib/site_perl/5.38.0
    /Users/joemcmahon/perl5/perlbrew/perls/perl-5.38.0/lib/5.38.0/darwin-2level
    /Users/joemcmahon/perl5/perlbrew/perls/perl-5.38.0/lib/5.38.0
jkeenan commented 1 year ago

I can confirm the weird behavior of the l command in the debugger reported by the OP. However, I'm not familiar enough with lib/perl5db.pl to locate the operative code. Can anyone else comment? @shlomif, perhaps? @tonycoz? E. Choroba?

tonycoz commented 1 year ago

I spent some time looking at this last week, the regexp for matching a range is overly permissive, and there's some strange processing of the matched values I don't understand.

This week I've been sick and haven't been able to follow-up on it.

The permissive regexp and the strange processing have been there long before @shlomif 's re-org of the debugger.

joemcmahon commented 1 year ago

I will take a look this week if Tony can't get to it. I think I remember where it is from back when I wrote the comments.

On Thu, Aug 17, 2023 at 4:57 PM Tony Cook @.***> wrote:

I spent some time looking at this last week, the regexp for matching a range is overly permissive, and there's some strange processing of the matched values I don't understand.

This week I've been sick and haven't been able to follow-up on it.

The permissive regexp and the strange processing have been there long before @shlomif https://github.com/shlomif 's re-org of the debugger.

— Reply to this email directly, view it on GitHub https://github.com/Perl/perl5/issues/21350#issuecomment-1683130766, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAAO6GY44MQR6AN632GMIYTXV2VWRANCNFSM6AAAAAA3JAZEZY . You are receiving this because you authored the thread.Message ID: @.***>

shlomif commented 1 year ago

hi all.

No promises, but I also would like to investigate this issue

joemcmahon commented 1 year ago

We should be able to get it between us!

On Thu, Aug 17, 2023 at 11:27 PM Shlomi Fish @.***> wrote:

hi all.

No promises, but I also would like to investigate this issue

— Reply to this email directly, view it on GitHub https://github.com/Perl/perl5/issues/21350#issuecomment-1683416152, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAAO6G7K4PLF3X3VSPREQHDXV4DOLANCNFSM6AAAAAA3JAZEZY . You are receiving this because you authored the thread.Message ID: @.***>

tonycoz commented 1 year ago

Have fun, I'll get pinged on updates to this ticket.

shlomif commented 1 year ago

In order to write a test assertion, I/we need to know what is the desirable behaviour. I support rejecting the inputs with an error. "explicit is better than implicit"

tonycoz commented 1 year ago

That seems reasonable.

joemcmahon commented 1 year ago

Agreed.

On Sun, Aug 20, 2023 at 6:43 PM Tony Cook @.***> wrote:

That seems reasonable.

— Reply to this email directly, view it on GitHub https://github.com/Perl/perl5/issues/21350#issuecomment-1685494742, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAAO6G6QLXRCL7LB6D2GNWLXWK4NNANCNFSM6AAAAAA3JAZEZY . You are receiving this because you authored the thread.Message ID: @.***>

joemcmahon commented 1 year ago

Okay, finally working on this, and the bug is caused by the regex that tries to extract the list range, and is being too clever; it's trying to parse a couple different things at once, and messing up because of that.

  2804     # l start-stop or l start,stop
  2805     elsif (my ($s, $e) = $spec =~ /^(?:(-?[\d\$\.]+)(?:[-,]([\d\$\.]+))?)?/ ) {
  2806         return _cmd_l_range($spec, $line, $s, $e);
  2807     }

This is trying to handle l $ref and the ranges for sure; it's also what's accepting the dots in the line numbers. It's also accepting lots of other things that are undocumented and don't work (like -12, and $ref-2 and 10 -2 and 10 -12 and ..............2). It's also what's accepting l 15.7 and l 127.0.0.1. Update: this also rolls in l -.

Last, the l $ref command doesn't really work at all.

I am going to break this match up into several different matches and match the pure line-number behavior documented in perldoc perldebug first; we can then figure out the bug that keeps l $ref from working.

jkeenan commented 1 year ago

Okay, finally working on this, and the bug is caused by the regex that tries to extract the list range, and is being too clever; it's trying to parse a couple different things at once, and messing up because of that.

Joe, thanks for plugging away on this. Could you identify the relevant line numbers in the codebase so that others can look at them as well?

Thanks.

joemcmahon commented 1 year ago

Sure, I'll edit the previous comment with that.

joemcmahon commented 1 year ago

Scanning up from the regex in question at line 2805, several of these cases are already handled and shouldn't be matched here at all:

  2780 sub _cmd_l_main {
  2781     my $spec = shift;
  2782
  2783     # If this is '-something', delete any spaces after the dash.
  2784     $spec =~ s/\A-\s*\z/-/;
  2785
  2786     # If the line is '$something', assume this is a scalar containing a
  2787     # line number.
  2788     # Set up for DB::eval() - evaluate in *user* context.
  2789     if ( my ($var_name) = $spec =~ /\A(\$.*)/s ) {
  2790         return _cmd_l_handle_var_name($var_name);
  2791     }
  2792     # l name. Try to find a sub by that name.
  2793     elsif ( ($subname) = $spec =~ /\A([\':A-Za-z_][\':\w]*(?:\[.*\])?)/s ) {
  2794         return _cmd_l_handle_subname();
  2795     }
  2796     # Bare 'l' command.
  2797     elsif ( $spec !~ /\S/ ) {
  2798         return _cmd_l_empty();
  2799     }
  2800     # l [start]+number_of_lines
  2801     elsif ( my ($new_start, $new_incr) = $spec =~ /\A(\d*)\+(\d*)\z/ ) {
  2802         return _cmd_l_plus($new_start, $new_incr);
  2803     }
  2804     # l start-stop or l start,stop
  2805     elsif (my ($s, $e) = $spec =~ /^(?:(-?[\d\$\.]+)(?:[-,]([\d\$\.]+))?)?/ ) {
  2806         return _cmd_l_range($spec, $line, $s, $e);
  2807     }
  2808
  2809     return;
  2810 } ## end sub _cmd_l_main

But the line 2805 regex needs to be simplified to (probably) \A(\d+)(?:[-,](\d+))?)?\z, which is l nnnn, l nnnn-mmmm, and l nnnn,mmmm. I'll test that, obviously, but the idea is that there shouldn't be any other special case in there at all: just simple line numbers, properly parsed.

joemcmahon commented 1 year ago

I dug back through the commit history, and at dad0832b48 (Feb 2002), lines 905 to 912 show us that a line number of . (just one dot) is supposed to be "the current line".

54d04a52ebe lib/perl5db.pl (Ilya Zakharevich     1996-02-06 15:32:09 -0500  905)                    $cmd =~ /^\.$/ && do {
1d06cb2d31f lib/perl5db.pl (Ilya Zakharevich     1996-12-26 14:54:34 -0500  906)                        $incr = -1;             # for backward motion.
54d04a52ebe lib/perl5db.pl (Ilya Zakharevich     1996-02-06 15:32:09 -0500  907)                        $start = $line;
54d04a52ebe lib/perl5db.pl (Ilya Zakharevich     1996-02-06 15:32:09 -0500  908)                        $filename = $filename_ini;
8ebc5c0145d lib/perl5db.pl (Perl 5 Porters       1997-01-04 17:44:00 +1200  909)                        *dbline = $main::{'_<' . $filename};
54d04a52ebe lib/perl5db.pl (Ilya Zakharevich     1996-02-06 15:32:09 -0500  910)                        $max = $#dbline;
f1583d8f9d0 lib/perl5db.pl (Ilya Zakharevich     2001-05-18 23:49:09 -0400  911)                        print_lineinfo($position);
54d04a52ebe lib/perl5db.pl (Ilya Zakharevich     1996-02-06 15:32:09 -0500  912)                        next CMD };
d338d6fe1df lib/perl5db.pl (Perl 5 Porters       1996-01-11 22:23:31 +0000  913)                    $cmd =~ /^w\b\s*(\d*)$/ && do {

So these should be acceptable range specs:

And so on. So that regex should have a case for matching . and nothing else. Continuing to dig back into the commit history.

joemcmahon commented 1 year ago

At that same commit, I can see that the same regex we've been using is still there. No special casing for anything other than ..

54d04a52ebe lib/perl5db.pl (Ilya Zakharevich     1996-02-06 15:32:09 -0500  932)                    $cmd =~ /^l\b\s*((-?[\d\$\.]+)([-,]([\d\$\.]+))?)?/ && do {
54d04a52ebe lib/perl5db.pl (Ilya Zakharevich     1996-02-06 15:32:09 -0500  933)                        $end = (!defined $2) ? $max : ($4 ? $4 : $2);
d338d6fe1df lib/perl5db.pl (Perl 5 Porters       1996-01-11 22:23:31 +0000  934)                        $end = $max if $end > $max;
d338d6fe1df lib/perl5db.pl (Perl 5 Porters       1996-01-11 22:23:31 +0000  935)                        $i = $2;
d338d6fe1df lib/perl5db.pl (Perl 5 Porters       1996-01-11 22:23:31 +0000  936)                        $i = $line if $i eq '.';
d338d6fe1df lib/perl5db.pl (Perl 5 Porters       1996-01-11 22:23:31 +0000  937)                        $i = 1 if $i < 1;
1d06cb2d31f lib/perl5db.pl (Ilya Zakharevich     1996-12-26 14:54:34 -0500  938)                        $incr = $end - $i;
055fd3a96a4 lib/perl5db.pl (Gurusamy Sarathy     2000-03-14 05:49:08 +0000  939)                        if ($slave_editor) {
d338d6fe1df lib/perl5db.pl (Perl 5 Porters       1996-01-11 22:23:31 +0000  940)                            print $OUT "\032\032$filename:$i:0\n";
d338d6fe1df lib/perl5db.pl (Perl 5 Porters       1996-01-11 22:23:31 +0000  941)                            $i = $end;
d338d6fe1df lib/perl5db.pl (Perl 5 Porters       1996-01-11 22:23:31 +0000  942)                        } else {
d338d6fe1df lib/perl5db.pl (Perl 5 Porters       1996-01-11 22:23:31 +0000  943)                            for (; $i <= $end; $i++) {
1b4321cf375 lib/perl5db.pl (Ilya Zakharevich     2001-07-02 02:27:22 -0400  944)                                my ($stop,$action);
04e43a21088 lib/perl5db.pl (Daniel S. Lewart     2001-05-22 21:18:03 -0500  945)                                ($stop,$action) = split(/\0/, $dbline{$i}) if
04e43a21088 lib/perl5db.pl (Daniel S. Lewart     2001-05-22 21:18:03 -0500  946)                                    $dbline{$i};
54d04a52ebe lib/perl5db.pl (Ilya Zakharevich     1996-02-06 15:32:09 -0500  947)                                $arrow = ($i==$line
54d04a52ebe lib/perl5db.pl (Ilya Zakharevich     1996-02-06 15:32:09 -0500  948)                                          and $filename eq $filename_ini)
54d04a52ebe lib/perl5db.pl (Ilya Zakharevich     1996-02-06 15:32:09 -0500  949)                                  ?  '==>'
36477c247f3 lib/perl5db.pl (Perl 5 Porters       1996-12-06 18:56:00 +1200  950)                                    : ($dbline[$i]+0 ? ':' : ' ') ;
54d04a52ebe lib/perl5db.pl (Ilya Zakharevich     1996-02-06 15:32:09 -0500  951)                                $arrow .= 'b' if $stop;
54d04a52ebe lib/perl5db.pl (Ilya Zakharevich     1996-02-06 15:32:09 -0500  952)                                $arrow .= 'a' if $action;
54d04a52ebe lib/perl5db.pl (Ilya Zakharevich     1996-02-06 15:32:09 -0500  953)                                print $OUT "$i$arrow\t", $dbline[$i];
65c9c81d8dc lib/perl5db.pl (Ilya Zakharevich     1998-10-27 20:23:27 -0500  954)                                $i++, last if $signal;
d338d6fe1df lib/perl5db.pl (Perl 5 Porters       1996-01-11 22:23:31 +0000  955)                            }
65c9c81d8dc lib/perl5db.pl (Ilya Zakharevich     1998-10-27 20:23:27 -0500  956)                            print $OUT "\n" unless $dbline[$i-1] =~ /\n$/;
d338d6fe1df lib/perl5db.pl (Perl 5 Porters       1996-01-11 22:23:31 +0000  957)                        }
d338d6fe1df lib/perl5db.pl (Perl 5 Porters       1996-01-11 22:23:31 +0000  958)                        $start = $i; # remember in case they want more
d338d6fe1df lib/perl5db.pl (Perl 5 Porters       1996-01-11 22:23:31 +0000  959)                        $start = $max if $start > $max;
d338d6fe1df lib/perl5db.pl (Perl 5 Porters       1996-01-11 22:23:31 +0000  960)                        next CMD; };

I'm going to jump back to before Ilya's commit that added this line and see if there was more complex code for parsing the regex results earlier.

joemcmahon commented 1 year ago

Larry's March 1991 commit contains the same regex without the leading -, but again, no special handling other than .. I begin to think that the regex was coded to catch all the cases that they wanted to eventually handle, but there was a dropped stitch and the special-er cases ended up being parsed elsewhere.

I think the only special case that is actually supposed to be there is ., and everything else was just cargo-culted forward.

fe14fcc35f7 lib/perldb.pl  (Larry Wall     1991-03-21 00:00:00 +0000 260)               $cmd =~ /^l\b\s*(([\d\$\.]+)([-,]([\d\$\.]+))?)?/ && do {
fe14fcc35f7 lib/perldb.pl  (Larry Wall     1991-03-21 00:00:00 +0000 261)                   $end = (!$2) ? $max : ($4 ? $4 : $2);
fe14fcc35f7 lib/perldb.pl  (Larry Wall     1991-03-21 00:00:00 +0000 262)                   $end = $max if $end > $max;
fe14fcc35f7 lib/perldb.pl  (Larry Wall     1991-03-21 00:00:00 +0000 263)                   $i = $2;
fe14fcc35f7 lib/perldb.pl  (Larry Wall     1991-03-21 00:00:00 +0000 264)                   $i = $line if $i eq '.';
fe14fcc35f7 lib/perldb.pl  (Larry Wall     1991-03-21 00:00:00 +0000 265)                   $i = 1 if $i < 1;
f0fcb552910 lib/perldb.pl  (Larry Wall     1991-11-05 06:28:06 +0000 266)                   if ($emacs) {
f0fcb552910 lib/perldb.pl  (Larry Wall     1991-11-05 06:28:06 +0000 267)                       print OUT "\032\032$filename:$i:0\n";
f0fcb552910 lib/perldb.pl  (Larry Wall     1991-11-05 06:28:06 +0000 268)                       $i = $end;
f0fcb552910 lib/perldb.pl  (Larry Wall     1991-11-05 06:28:06 +0000 269)                   } else {
f0fcb552910 lib/perldb.pl  (Larry Wall     1991-11-05 06:28:06 +0000 270)                       for (; $i <= $end; $i++) {
f0fcb552910 lib/perldb.pl  (Larry Wall     1991-11-05 06:28:06 +0000 271)                           print OUT "$i:\t", $dbline[$i];
f0fcb552910 lib/perldb.pl  (Larry Wall     1991-11-05 06:28:06 +0000 272)                           last if $signal;
f0fcb552910 lib/perldb.pl  (Larry Wall     1991-11-05 06:28:06 +0000 273)                       }
a687059cbaf lib/perldb.pl  (Larry Wall     1989-10-18 00:00:00 +0000 274)                   }
fe14fcc35f7 lib/perldb.pl  (Larry Wall     1991-03-21 00:00:00 +0000 275)                   $start = $i;        # remember in case they want more
fe14fcc35f7 lib/perldb.pl  (Larry Wall     1991-03-21 00:00:00 +0000 276)                   $start = $max if $start > $max;
fe14fcc35f7 lib/perldb.pl  (Larry Wall     1991-03-21 00:00:00 +0000 277)                   next CMD; };

This is the earliest commit where the debugger actually exists, so I think we've dug back as far as we can go.

joemcmahon commented 1 year ago

Pattern cleaned up and passing current tests; $ was also removed.

Going to add a few more tests to verify that the formerly-accepted but invalid strings are now rejected -- there's a catchall that print "Invalid line range specification" at the end. Should have a PR tomorrow.

joemcmahon commented 1 year ago

https://github.com/Perl/perl5/pull/21488 fixes the egregious behavior. We still have some issues that are outside the scope of this bug, and I'll open new issues for those. They're related to UTF-8 and are a lot more challenging.

tonycoz commented 1 year ago

Fixed by #21488