goto &SUB segfaults with builtin::refaddr, reftype, stringify, etc

mauke commented 2 months ago

Description Some of the built-in functions in builtin:: crash when called via goto &builtin::X.

This is (one of) the underlying issue(s) from #22528. I'm reporting it separately so it doesn't get lost even if Tie::RefHash::Weak gets fixed (or rather, a workaround is deployed in Tie::RefHash). As noted in https://github.com/Perl/perl5/issues/22528#issuecomment-2302576538 by @leonerd:

Huh. Exciting times:

$ bleadperl -E 'sub { goto &builtin::refaddr }->(0)'
panic: pad_sv po at -e line 1.

Steps to Reproduce

$ ./perl -e 'sub { goto &builtin::refaddr }->([])'
Segmentation fault
$ ./perl -e 'sub { goto &builtin::reftype }->([])'
Segmentation fault
$ ./perl -e 'sub { goto &builtin::stringify }->([])'
Segmentation fault
$ ./perl -e 'sub { goto &builtin::trim }->([])'
Segmentation fault

Expected behavior No crash.

Perl configuration

Summary of my perl5 (revision 5 version 41 subversion 3) configuration:
  Derived from: 34d9693e57b11b0ee7c74c4981facac0091db166
  Platform:
    osname=linux
    osvers=6.5.0-10043-tuxedo
    archname=x86_64-linux
    uname='linux luum 6.5.0-10043-tuxedo #47 smp preempt_dynamic tue jun 4 14:29:55 utc 2024 x86_64 x86_64 x86_64 gnulinux '
    config_args='-Dcc=c-gcc -Dusedevel -des'
    hint=recommended
    useposix=true
    d_sigaction=define
    useithreads=undef
    usemultiplicity=undef
    use64bitint=define
    use64bitall=define
    uselongdouble=undef
    usemymalloc=n
    default_inc_excludes_dot=define
  Compiler:
    cc='c-gcc'
    ccflags ='-fwrapv -fno-strict-aliasing -pipe -fstack-protector-strong -I/usr/local/include -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64'
    optimize='-O2'
    cppflags='-fwrapv -fno-strict-aliasing -pipe -fstack-protector-strong -I/usr/local/include'
    ccversion=''
    gccversion='11.4.0'
    gccosandvers=''
    intsize=4
    longsize=8
    ptrsize=8
    doublesize=8
    byteorder=12345678
    doublekind=3
    d_longlong=define
    longlongsize=8
    d_longdbl=define
    longdblsize=16
    longdblkind=3
    ivtype='long'
    ivsize=8
    nvtype='double'
    nvsize=8
    Off_t='off_t'
    lseeksize=8
    alignbytes=8
    prototype=define
  Linker and Libraries:
    ld='c-gcc'
    ldflags =' -fstack-protector-strong -L/usr/local/lib'
    libpth=/usr/local/lib /usr/lib/x86_64-linux-gnu /usr/lib /usr/lib64
    libs=-lpthread -ldb -ldl -lm -lcrypt -lutil -lc
    perllibs=-lpthread -ldl -lm -lcrypt -lutil -lc
    libc=/lib/x86_64-linux-gnu/libc.so.6
    so=so
    useshrplib=false
    libperl=libperl.a
    gnulibc_version='2.35'
  Dynamic Linking:
    dlsrc=dl_dlopen.xs
    dlext=so
    d_dlsymun=undef
    ccdlflags='-Wl,-E'
    cccdlflags='-fPIC'
    lddlflags='-shared -O2 -L/usr/local/lib -fstack-protector-strong'

Characteristics of this binary (from libperl): 
  Compile-time options:
    HAS_LONG_DOUBLE
    HAS_STRTOLD
    HAS_TIMES
    PERLIO_LAYERS
    PERL_COPY_ON_WRITE
    PERL_DONT_CREATE_GVSV
    PERL_HASH_FUNC_SIPHASH13
    PERL_HASH_USE_SBOX32
    PERL_MALLOC_WRAP
    PERL_OP_PARENT
    PERL_PRESERVE_IVUV
    PERL_USE_DEVEL
    PERL_USE_SAFE_PUTENV
    USE_64_BIT_ALL
    USE_64_BIT_INT
    USE_LARGE_FILES
    USE_LOCALE
    USE_LOCALE_COLLATE
    USE_LOCALE_CTYPE
    USE_LOCALE_NUMERIC
    USE_LOCALE_TIME
    USE_PERLIO
    USE_PERL_ATOF
  Locally applied patches:
    uncommitted-changes
  Built under linux
  Compiled at Aug 26 2024 04:28:06
  %ENV:
    PERLBREW_BASHRC_VERSION="0.74"
    PERLBREW_HOME="/home/mauke/.perlbrew"
    PERLBREW_MANPATH="/home/mauke/perl5/perlbrew/perls/perl-5.40.0/man"
    PERLBREW_PATH="/home/mauke/perl5/perlbrew/bin:/home/mauke/perl5/perlbrew/perls/perl-5.40.0/bin"
    PERLBREW_PERL="perl-5.40.0"
    PERLBREW_ROOT="/home/mauke/perl5/perlbrew"
    PERLBREW_VERSION="0.94"
    PERLDOC="-oman"
    PERL_UNICODE="SAL"
  @INC:
    lib
    /usr/local/lib/perl5/site_perl/5.41.3/x86_64-linux
    /usr/local/lib/perl5/site_perl/5.41.3
    /usr/local/lib/perl5/5.41.3/x86_64-linux
    /usr/local/lib/perl5/5.41.3

tonycoz commented 2 months ago

The various builtin:: pp_ funcs assume TARG is available, and when called from a normal entersub OP, it is, since one was added to every* normal entersub in d30110745a to speed up any XS that uses dXSTARG (originally dXS_TARGET).

But goto OPs don't have a TARG, so we see the panic we get here.

We have a similar problem when calling with call_sv():

$ ./perl -Ilib -MXS::APItest=call_sv -e 'call_sv(\&builtin::refaddr, 0, [])'
panic: pad_sv po at -e line 1.

since call_sv() doesn't add a targ to the entersub OP it mocks up.

I think the fix is to have the builtin:: pp_ funcs use dXSTARG and have ck_builtin_func1 set OPpENTERSUB_HASTARG.

leonerd commented 2 months ago

@tonycoz Hrm; a likely-sounding idea, but at first attempt it falls over:

pp.c: In function ‘Perl_pp_reftype’:
pp.c:7929:5: error: ‘dXSTARG’ undeclared (first use in this function); did you mean ‘dTAR
’?
 7929 |     dXSTARG;
      |     ^~~~~~~
      |     dTARG

This because dXSTARG comes from XSUB.h which we don't pull into pp.c.

Such a fix works fine for the XS_builtin_trim that appears in builtin.c though.

tonycoz commented 1 month ago

This because dXSTARG comes from XSUB.h which we don't pull into pp.c.

If we want these functions to be used both as XSUBs and pp funcs that will need to change, or we need to move the pp funcs into builtin.c.

Or not use TARG in them, which loses performance.

Another option would be to use the coresub mechanism (handles &CORE::opname) to handle it, but nightmares lurk there (it would make the XSUBs slower too).

mauke commented 1 month ago

My assumptions: Many builtin::* functions call Perl_pp_* directly. The Perl_pp_* functions assume they can place their result in a pre-allocated SV slot (the "target"), which they access using PL_curpad[PL_op->op_targ]. In normal code, these slots are created by the compiler depending on what operations appear in a given scope.

If my understanding of the situation is correct, then I see two ways forward:

Have the builtin::* functions allocate dummy PL_curpad and PL_op structures, and temporarily switch to them for the call to Perl_pp_*.
Factor out the Perl_pp_* logic into separate functions that take (SV *targ, SV *arg) arguments. The Perl_pp_* functions become wrappers that do their usual dTARGET; ... rpp_replace_1_1_NN(TARG); dance, and the builtin::* functions can call the underlying implementation directly without worrying about targets.

iabyn commented 1 month ago

On Sat, Sep 07, 2024 at 11:08:28PM -0700, mauke wrote:

If my understanding of the situation is correct, then I see two ways forward:

Or 3) reuse whatever mechanism the CORE namespace uses. Every perl builtin op is accessible as a function, .e.g.:

my $r = [];
# prints "ARRAY ARRAY"
printf "%s %s\n",
    ref($r),
    &CORE::ref($r);

I can't recall off the top of my head how it's implemented.

-- Modern art: "That's easy, I could have done that!" "Ah, but you didn't!"

Perl / perl5

goto &SUB segfaults with builtin::refaddr, reftype, stringify, etc #22542