Perl / perl5

🐪 The Perl programming language
https://dev.perl.org/perl5/
Other
1.85k stars 527 forks source link

build fails when building with PERL_NO_COW #19987

Open bram-perl opened 1 year ago

bram-perl commented 1 year ago

Noticed when attempting to build with -Accflags='-DPERL_NO_COW'

    $ ./Configure -des -Dusedevel -Accflags='-DPERL_NO_COW'
    $ make -j8 miniperl
    ...
    ./miniperl -Ilib -f write_buildcustomize.pl
    makefile:363: recipe for target 'lib/buildcustomize.pl' failed
    make: *** [lib/buildcustomize.pl] Segmentation fault (core dumped)

Rebuilding with -DDEBUGGING shows an assertion failure:

    $ ./Configure -des -Dusedevel -Accflags='-DPERL_NO_COW' -DDEBUGGING
    $ make -j miniperl
    ...
    ./miniperl -Ilib -f write_buildcustomize.pl
    miniperl: hv.c:2972: S_unshare_hek_or_pvn: Assertion `he->shared_he_he.hent_hek == hek' failed.
    makefile:363: recipe for target 'lib/buildcustomize.pl' failed

Reducing to a minimal test case:

    $ ./miniperl -wle 'my $x = (0 == 1);'
    miniperl: hv.c:2972: S_unshare_hek_or_pvn: Assertion `he->shared_he_he.hent_hek == hek' failed.
    Aborted (core dumped)

Bisecting points to commit 914bb57489325d34ddbb7c0557c53df7baa84d86: (@leonerd)

    commit 914bb57489325d34ddbb7c0557c53df7baa84d86
    Author: Paul "LeoNerd" Evans <leonerd@leonerd.org.uk>
    Date:   Sat Aug 7 14:46:48 2021 +0100

        Define a third kind of COW state; STATIC

        Previously, when IsCOW flag was set there were two cases:
          SvLEN()==0:
              PV is really a shared HEK

          SvLEN()!=0:
              PV is a COW structure with 1..256 refcount stored in its extra final byte

        This change adds a third state:

          SvLEN()==0 && SvFLAGS() & SVppv_STATIC:
              PV is a shared static const pointer and must not be modified

        sv_setsv_flags() and sv_setsv_cow() will preserve this state

        sv_uncow() will copy it out to a regular string buffer

        sv_dup() will preserve the static pointer into cloned threads

(Note: I have no use case for -DPERL_NO_COW, I only tried it while looking at another issue)

Steps to reproduce

  1. ./Configure -des -Dusedevel -Accflags='-DPERL_NO_COW' -DDEBUGGING
  2. make -j8 miniperl
  3. ./miniperl -wle 'my $x = (0 == 1);'

Actual result

Segmentation fault/Assertion failure:

S_unshare_hek_or_pvn: Assertion `he->shared_he_he.hent_hek == hek' failed.

Expected result

No crash.

jkeenan commented 1 year ago

Noticed when attempting to build with -Accflags='-DPERL_NO_COW'

    $ ./Configure -des -Dusedevel -Accflags='-DPERL_NO_COW'
    $ make -j8 miniperl
    ...
    ./miniperl -Ilib -f write_buildcustomize.pl
    makefile:363: recipe for target 'lib/buildcustomize.pl' failed
    make: *** [lib/buildcustomize.pl] Segmentation fault (core dumped)

Observed on FreeBSD as well.

iabyn commented 1 year ago

On Sat, Jul 23, 2022 at 01:16:13PM -0700, Bram wrote:

Noticed when attempting to build with -Accflags='-DPERL_NO_COW'

    $ ./Configure -des -Dusedevel -Accflags='-DPERL_NO_COW'
    $ make -j8 miniperl
    ...
    ./miniperl -Ilib -f write_buildcustomize.pl
    makefile:363: recipe for target 'lib/buildcustomize.pl' failed
    make: *** [lib/buildcustomize.pl] Segmentation fault (core dumped)

Given that it's been broken for a year and you're the first to notice, I wonder whether the time has come to remove the facility to build a non-COW perl? It's been on by default for 8 years now.

-- All wight. I will give you one more chance. This time, I want to hear no Wubens. No Weginalds. No Wudolf the wed-nosed weindeers. -- Life of Brian

bram-perl commented 1 year ago

It's broken in a dev release since v5.35.4 (2021-Sep-20) In a non-dev release it's been broken since v5.36.0 (2022-05-27)

Anyway, briefly discussing this with @leonerd :

The question is if someone still has a use-case for building without Copy on Write.. (I do not have one) If someone does have then it should show up when they attempt to build perl v5.36.. If nothing shows up then "disabling CoW" could be ripped out..

nielsl commented 10 months ago

I have delayed upgrading to newer perl because they fail when PERL_NO_COW is on. But that too is now failing, because older perls have other compilation errors on the latest Linux systems. COW does not seem to extend to C functions: when passing a string reference to a C function that updates it (as a byte array) then the cloning does not happen it seems. If this feature disappears I will have to do explicit copying in many places. I think this compilation problem should be fixed (same as above, except on Linux Mint 21.1) rather than shipping for a long time something that doesn't work. Whether the feature is useful or not, thats a separate discussion. I think it is. So my big software package is stuck between one compilation error or another. I will even pay reasonably to have it fixed. (Btw, the switch to Github instead of email means shifting convenience from users to developers, maybe not a good idea, at least I always try to do the opposite with my code).

demerphq commented 10 months ago

@nielsl I suspect you just need to change your code a bit so that SvPV() are changed to be one of the following:

            char*  SvPV_force              (SV* sv, STRLEN len)
            char*  SvPV_force_nolen        (SV* sv)
            char*  SvPVx_force             (SV* sv, STRLEN len)
            char*  SvPV_force_nomg         (SV* sv, STRLEN len)
            char*  SvPV_force_nomg_nolen   (SV * sv)
            char*  SvPV_force_mutable      (SV * sv, STRLEN len)
            char*  SvPV_force_flags        (SV * sv, STRLEN len, U32 flags)
            char*  SvPV_force_flags_nolen  (SV * sv, U32 flags)
            char*  SvPV_force_flags_mutable(SV * sv, STRLEN len, U32 flags)
            char*  SvPVbyte_force          (SV* sv, STRLEN len)
            char*  SvPVbytex_force         (SV* sv, STRLEN len)
            char*  SvPVutf8_force          (SV* sv, STRLEN len)
            char*  SvPVutf8x_force         (SV* sv, STRLEN len)

You can also use the function sv_force_normal() as well. See perldoc perlapi for details.

COW does not refer to the Unix COW facility, it is purely an internal feature.

I would be happy to discuss more privately, you can find my email on the list.

Having said that, I do think that we should make PERL_NO_COW work. Nothing we do should depend on COW. That would be/is a very sad state of affairs, as it would mean we have a hard dependency on some particular implementation of COW, and IMO we should be able to change how COW works without causing havok. The whole scheme of putting the COW bytes at the end of the PV buffer, and various other aspects of its implementation definitely could be improved, as it leaves cases that can't be COW'ed and causes other problems (there is some kind of hairy interaction with file operations and regexes that we never really fixed afair), and if we have somehow hard coded a dependency on COW then IMO we should fix it.

demerphq commented 10 months ago

@bram-perl said:

Anyway, briefly discussing this with @leonerd :

  • Boolean need copy on write
  • firstclass bool values are cow-static strings

That doesn't make sense to me, it feels like an unforced issue that we can resolve if we choose to. I will add this to my todo list.

leonerd commented 10 months ago

That doesn't make sense to me, it feels like an unforced issue that we can resolve if we choose to. I will add this to my todo list.

Yah; firstclass-bools don't actually rely on COW, they just happen to exploit the fact that if they set the COW flag, then the PV buffer is shared by plain pointer address, and so we can use that pointer to distinguish if it is the special boolean true/false value. You could arrange for that pointer to be statically-shared by some other mechanism if COW was not enabled.

nielsl commented 10 months ago

@demerphq Thank you for giving me a way out at the C level. I am not very good with Perl/C, but managed to create "pidgin-C" functions in all the places where there is a memory or speed bottleneck, and where the code doesn't change.

nielsl commented 10 months ago

@bram-perl said:

Anyway, briefly discussing this with @leonerd :

  • Boolean need copy on write
  • firstclass bool values are cow-static strings

That doesn't make sense to me, it feels like an unforced issue that we can resolve if we choose to. I will add this to my todo list.

Oh that would be wonderful, thanks. I can then update to the latest Perl and to all the non-standard Perl and GNU packages that I depend on. Then I can install my system on the latest Linux and Unix systems again. And if the fix is not immediate, I can now make temporary workarounds knowing that there will be a fix. Bravo.

nielsl commented 2 weeks ago

@bram-perl said:

Anyway, briefly discussing this with @leonerd :

  • Boolean need copy on write
  • firstclass bool values are cow-static strings

That doesn't make sense to me, it feels like an unforced issue that we can resolve if we choose to. I will add this to my todo list.

Is this done by now?

nielsl commented 2 weeks ago

@bram-perl said:

Anyway, briefly discussing this with @leonerd :

  • Boolean need copy on write
  • firstclass bool values are cow-static strings

That doesn't make sense to me, it feels like an unforced issue that we can resolve if we choose to. I will add this to my todo list.

I can answer my own question: in 5.38.2, compiling with -DPERL_NO_COW still crashes on the most commonly used Linux systems. Attached are the configure and make outputs from 5.38.2 and 5.26.1, and the hardware configuration. Below the gcc version and system software.

I am using 5.26.1 with a large amount of perl modules that I wrote for a DNA analysis package. Customers use it and I cannot build it. I could upgrade to the latest stable, and I will, but it means updating CPAN modules and testing them in combination for breakage (no, I cannot just always take the latest, something will break). As was suggested above, I could change C-function calls throughout, but doing and testing that would take me 1-2 weeks of studying and repair probably (well I have no idea really). I hoped it would be fixed by now. I don't mind paying in advance if this will be fixed quickly, since customers will have to postpone otherwise. Do you see it as an easy fix, or is it difficult? can the fix be back-ported to 5.26.1? is there a fix already in development versions? Contact info below.

5.38.2-error.txt 5.26.1-error.txt hw-short.txt

gcc (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0 Copyright (C) 2021 Free Software Foundation, Inc. This is free software; see the source for copying conditions. There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

gcc (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0 Copyright (C) 2021 Free Software Foundation, Inc. This is free software; see the source for copying conditions. There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

DISTRIB_ID=LinuxMint DISTRIB_RELEASE=21.3 DISTRIB_CODENAME=virginia DISTRIB_DESCRIPTION="Linux Mint 21.3 Virginia"

Niels Larsen, PhD, CEO Danish Genome Institute

E-mail: niels@genomics.dk Skype: niels_larsen_denmark Mobile: +45-3091-5426 (GMT+1)

jkeenan commented 2 weeks ago

@bram-perl said:

Anyway, briefly discussing this with @leonerd :

  • Boolean need copy on write
  • firstclass bool values are cow-static strings

That doesn't make sense to me, it feels like an unforced issue that we can resolve if we choose to. I will add this to my todo list.

Is this done by now?

Apparently not. Trying the reduced case provided by @bram-perl in July 2022 (on Ubuntu Linux 22.04 LTS with gcc-11 as the C-compiler), I get:

$ git describe
v5.39.10-4-g9d71dbb721

$./Configure -des -Dusedevel -Accflags='-DPERL_NO_COW' -DDEBUGGING && \
   make miniperl
cc -fstack-protector-strong -L/usr/local/lib -o miniperl \
    opmini.o perlmini.o universalmini.o av.o builtin.o caretx.o class.o deb.o doio.o doop.o dquote.o dump.o globals.o gv.o hv.o keywords.o locale.o mathoms.o mg.o mro_core.o numeric.o pad.o peep.o perlio.o perly.o pp.o pp_ctl.o pp_hot.o pp_pack.o pp_sort.o pp_sys.o reentr.o regcomp.o regcomp_debug.o regcomp_invlist.o regcomp_study.o regcomp_trie.o regexec.o run.o scope.o sv.o taint.o time64.o toke.o utf8.o util.o   miniperlmain.o  -lpthread -ldl -lm -lcrypt -lutil -lc 
./miniperl -w -Ilib -Idist/Exporter/lib -MExporter -e '<?>' || sh -c 'echo >&2 Failed to build miniperl.  Please run make minitest; exit 1'
./miniperl -Ilib -f write_buildcustomize.pl
miniperl: hv.c:3259: S_unshare_hek_or_pvn: Assertion `he->shared_he_he.hent_hek == hek' failed.
make: *** [makefile:392: lib/buildcustomize.pl] Aborted (core dumped)

Note that perl-5.26 was originally released on May 30 2017. So it's long out-of-maintenance.

nielsl commented 2 weeks ago

@nielsl I suspect you just need to change your code a bit so that SvPV() are changed to be one of the following:

            char*  SvPV_force              (SV* sv, STRLEN len)
            char*  SvPV_force_nolen        (SV* sv)
            char*  SvPVx_force             (SV* sv, STRLEN len)
            char*  SvPV_force_nomg         (SV* sv, STRLEN len)
            char*  SvPV_force_nomg_nolen   (SV * sv)
            char*  SvPV_force_mutable      (SV * sv, STRLEN len)
            char*  SvPV_force_flags        (SV * sv, STRLEN len, U32 flags)
            char*  SvPV_force_flags_nolen  (SV * sv, U32 flags)
            char*  SvPV_force_flags_mutable(SV * sv, STRLEN len, U32 flags)
            char*  SvPVbyte_force          (SV* sv, STRLEN len)
            char*  SvPVbytex_force         (SV* sv, STRLEN len)
            char*  SvPVutf8_force          (SV* sv, STRLEN len)
            char*  SvPVutf8x_force         (SV* sv, STRLEN len)

You can also use the function sv_force_normal() as well. See perldoc perlapi for details.

COW does not refer to the Unix COW facility, it is purely an internal feature.

I would be happy to discuss more privately, you can find my email on the list.

Having said that, I do think that we should make PERL_NO_COW work. Nothing we do should depend on COW. That would be/is a very sad state of affairs, as it would mean we have a hard dependency on some particular implementation of COW, and IMO we should be able to change how COW works without causing havok. The whole scheme of putting the COW bytes at the end of the PV buffer, and various other aspects of its implementation definitely could be improved, as it leaves cases that can't be COW'ed and causes other problems (there is some kind of hairy interaction with file operations and regexes that we never really fixed afair), and if we have somehow hard coded a dependency on COW then IMO we should fix it.

@bram-perl said:

Anyway, briefly discussing this with @leonerd :

  • Boolean need copy on write
  • firstclass bool values are cow-static strings

That doesn't make sense to me, it feels like an unforced issue that we can resolve if we choose to. I will add this to my todo liperl-5.26.1 does compile

@bram-perl said:

Anyway, briefly discussing this with @leonerd :

  • Boolean need copy on write
  • firstclass bool values are cow-static strings

That doesn't make sense to me, it feels like an unforced issue that we can resolve if we choose to. I will add this to my todo list.

Perl-5.26.1 does compile with PERL_NO_COW on Ubuntu 20 (gcc-9), with a lot of warnings, but it fails on Ubuntu 22 (gcc-11). Perl-5.34.3 does compile with PERL_NO_COW on Ubuntu 22, so that takes the immediate customer pressure off me. I tried your suggestions, but get either symbol lookup error or clearly wrong results. I am not fluent in C though. As far as I can tell, this affects anyone who calls a C-function from Perl that modifies a string/byte array, so isn't it a bit odd no-one has complained ..

nielsl commented 2 weeks ago

It's broken in a dev release since v5.35.4 (2021-Sep-20) In a non-dev release it's been broken since v5.36.0 (2022-05-27)

Anyway, briefly discussing this with @leonerd :

* Boolean need copy on write

* firstclass bool values are cow-static strings

The question is if someone still has a use-case for building without Copy on Write.. (I do not have one) If someone does have then it should show up when they attempt to build perl v5.36.. If nothing shows up then "disabling CoW" could be ripped out..

Thanks for mentioning this, after I saw your comment I did have luck with PERL_NO_COW in Perl-5.34.3 on Ubuntu 22 (gcc-11). As far as I can tell, this affects anyone who calls a C-function from Perl that modifies a string/byte array, so isn't it a bit odd no-one has complained ..

Leont commented 2 weeks ago

As far as I can tell, this affects anyone who calls a C-function from Perl that modifies a string/byte array, so isn't it a bit odd no-one has complained ..

We're calling SV_CHECK_THINKFIRST quite aggressively (probably even too aggressively) so usually that's not a problem.

nielsl commented 2 weeks ago

You mean I should I call

void SV_CHECK_THINKFIRST(SV * sv )

on every scalar passed from Perl that C operates on

in every C-function in every program until PERL_NO_COW is fixed?

I could, if that would make it work for sure. For now,

I am ok with 5.34.3, even if it takes some weeks to fix.

Sent with Proton Mail secure email.

On Tuesday, April 30th, 2024 at 11:44 AM, Leon Timmermans @.***> wrote:

As far as I can tell, this affects anyone who calls a C-function from Perl that modifies a string/byte array, so isn't it a bit odd no-one has complained ..

We're calling SV_CHECK_THINKFIRST quite aggressively (probably even too aggressively) so usually that's not a problem.

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you were mentioned.Message ID: @.***>

mauke commented 2 weeks ago

You mean I should I call

void  SV_CHECK_THINKFIRST(SV * sv )

on every scalar passed from Perl that C operates on in every C-function in every program until PERL_NO_COW is fixed?

Well, according to https://perldoc.perl.org/perlapi#SV_CHECK_THINKFIRST you should call SV_CHECK_THINKFIRST (or SV_CHECK_THINKFIRST_COW_DROP) on any scalar whose char * buffer you want to modify directly in C. That should make your code compatible with both COW and PERL_NO_COW (if and when that is fixed).

Leont commented 2 weeks ago

Do you have some code you can share? That would make it a lot easier for us to help you.

nielsl commented 2 weeks ago

Thank you, yes certainly. I did send the piece below to you, but must not have gotten the right e-mail address.

Nearly all my byte allocations are done this way

static void get_ptr( SV obj ) { return SvPVX( obj ); }

and calls look like

unsigned int interleave_sims_C( float maxdif, SV old_ndcs_sv, SV old_sims_sv, unsigned int old_len, SV add_ndcs_sv, SV add_sims_sv, unsigned int add_len, SV new_ndcs_sv, SV new_sims_sv ) { unsigned int old_ndcs = get_ptr( old_ndcs_sv ); unsigned int add_ndcs = get_ptr( add_ndcs_sv ); unsigned int* new_ndcs = get_ptr( new_ndcs_sv );

float old_sims = get_ptr( old_sims_sv ); float add_sims = get_ptr( add_sims_sv ); float* new_sims = get_ptr( new_sims_sv );

.... }

The get_ptr function is identical across many modules so it would be quite easy to fix it turns out, until the PERL_NO_COW compilation option is fixed (as you suggested it should be).

If I change SvPVX to SvPVX_force as suggested, then I get

undefined symbol: SvPVX_force

But I don't really know what I am doing in C, so maybe that is why. My e-mail address is niels at genomics dot dk if you want to ask me something off the list.

With thanks,

Niels L

Leont commented 2 weeks ago

What happens if you use SvPV_force_nolen instead of SvPVX for any mutable buffer, and I suppose SvPV_nolen for any buffer you don't write to? You should never use SvPVX unless you really, really know what you're doing.

nielsl commented 2 weeks ago

What happens if you use SvPV_force_nolen instead of SvPVX for any mutable buffer, and I suppose SvPV_nolen for any buffer you don't write to? You should never use SvPVX unless you really, really know what you're doing.

With 5.38.2 compiled without PERL_NO_COW, SvPV_force_nolen (for both readonly and read/write buffers) runs but produces very wrong results. I would have to track where that happens and why, but that is not a quick thing to do. I am guessing it will make no difference to use SvPV_nolen for readonly and SvPV_force_nolen for read/write, but I will try that if you want.

Well I know what I put into all the linear buffers (Perl strings) and their boundaries, but I am weak at C and largely unfamiliar with Perl guts. I have ways to check these buffers and overall results and SvPVX has not produced errors since 2012 and it has been battle-tested "in production". But I will follow you Perl gurus advice of course. I am able to compile 5.34.3 with PERL_NO_COW and have things working, which takes pressure off me for weeks, maybe months. Longer term I sure hope PERL_NO_COW will be back, as Yves Orton argued above it should.

demerphq commented 2 weeks ago

With 5.38.2 compiled without PERL_NO_COW, SvPV_force_nolen (for both readonly and read/write buffers) runs but produces very wrong results.

I think you want SvPV_mutable() or SvPVX_mutable(), it requires a len argument, but you can provide it and ignore it.

It would be nice if you could flesh out what you mean by "very wrong" would help us understand what the problem is a bit better. The function @Leont suggested, SvPV_force_nolen() is similar to SvPV_mutable() for many purposes, but it can also break things. See the docs:

        These are like "SvPV", returning the string in the SV, but will
        force the SV into containing a string ("SvPOK"), and only a string
        ("SvPOK_only"), by hook or by crook. You need to use one of these
        "force" routines if you are going to update the "SvPVX" directly.

        Note that coercing an arbitrary scalar into a plain PV will
        potentially strip useful data from it. For example if the SV was
        "SvROK", then the referent will have its reference count
        decremented, and the SV itself may be converted to an "SvPOK" scalar
        with a string buffer containing a value such as "ARRAY(0x1234)".

So i can easily imagine that excessive use of SvPV_force_nolen() would break things. I didnt notice anything in this thread mentioning SvPV_mutable() or SvPVX_mutable() so could you try those?

Some kind of script which replicates the issues you are having would be super nice. Some kind of idea of what "very wrong" means would as well.

Really you shouldn't /need/ a PERL_NO_COW, so lets figure out why you think you do.

Leont commented 2 weeks ago

I think you want SvPV_mutable() or SvPVX_mutable(), it requires a len argument, but you can provide it and ignore it.

Yeah you're right.

nielsl commented 2 weeks ago

With 5.38.2 compiled without PERL_NO_COW, SvPV_force_nolen (for both readonly and read/write buffers) runs but produces very wrong results.

I think you want SvPV_mutable() or SvPVX_mutable(), it requires a len argument, but you can provide it and ignore it.

It would be nice if you could flesh out what you mean by "very wrong" would help us understand what the problem is a bit better. The function @Leont suggested, SvPV_force_nolen() is similar to SvPV_mutable() for many purposes, but it can also break things. See the docs:

        These are like "SvPV", returning the string in the SV, but will
        force the SV into containing a string ("SvPOK"), and only a string
        ("SvPOK_only"), by hook or by crook. You need to use one of these
        "force" routines if you are going to update the "SvPVX" directly.

        Note that coercing an arbitrary scalar into a plain PV will
        potentially strip useful data from it. For example if the SV was
        "SvROK", then the referent will have its reference count
        decremented, and the SV itself may be converted to an "SvPOK" scalar
        with a string buffer containing a value such as "ARRAY(0x1234)".

So i can easily imagine that excessive use of SvPV_force_nolen() would break things. I didnt notice anything in this thread mentioning SvPV_mutable() or SvPVX_mutable() so could you try those?

Some kind of script which replicates the issues you are having would be super nice. Some kind of idea of what "very wrong" means would as well.

Really you shouldn't /need/ a PERL_NO_COW, so lets figure out why you think you do.

With SvPVX_mutable (which takes no second argument), it runs and makes same wrong results as SvPV_force_nolen. With SvPV_mutable (which takes a length argument) I got compile errors that I don't yet understand. I will make a test case, but I am struggling with deadlines and it make take two weeks before it appears.

I see two separate questions with PERL_NO_COW, 1) that it crashes compilation in latest stable on all platforms probably, and 2) whether the feature is useful or not, and should be phased out or not. One reason no-one have complained could be that very few other than module writers use C functions from Perl, but that is pure guessing.

bram-perl commented 2 weeks ago

To me the recent comments look a bit out of scope for this issue and look more like a "support/help/... request" and should be discussed elsewhere (mailing list?) and not in this particular issue. If it does end up with a "valid"[^1] use case or with guidelines on what can be used instead then it would ideally be summarized in here but going through that entire process one step at a time just feels like extra noise.

[^1]: I'm not saying your current use case isn't valid. I know far too little about your code so I can't comment on that. But others have indicated that you should not need the PERL_NO_COW and given that there have been no other comment about it suggests that it is possible in most (or all?) cases.

nielsl commented 2 weeks ago

To me the recent comments look a bit out of scope for this issue and look more like a "support/help/... request" and should be discussed elsewhere (mailing list?) and not in this particular issue. If it does end up with a "valid"1 use case or with guidelines on what can be used instead then it would ideally be summarized in here but going through that entire process one step at a time just feels like extra noise.

Footnotes

1. I'm not saying your current use case isn't valid. I know far too little about your code so I can't comment on that. But others have indicated that you should not need the `PERL_NO_COW` and given that there have been no other comment about it suggests that it is possible in most (or all?) cases. [↩](#user-content-fnref-1-877771d6162ab0153885ced2e613a22d)

Helping me with a work-around I agree could be taken off this forum, and I agree it is a bit noisy and off-topic. But having stable Perl crash since 5.36.0 on probably all platforms when compiled with PERL_NO_COW is certainly on-topic. I brought it up last summer, and with a timely fix there would have been no need for work-arounds.

demerphq commented 2 weeks ago

it runs and makes same wrong results as SvPV_force_nolen

Ok, well until you have time to flesh that out I am not sure what else we can do. Those functions should ensure that the SV owns the ptr, and that the buffer it points to is safe to modify in place. So if you are experiencing breakage I don't see why PERL_NO_COW would fix things.

I see two separate questions with PERL_NO_COW, 1) that it crashes compilation in latest stable on all platforms probably, and 2) whether the feature is useful or not,

Regarding 1, we have a general problem that we don't test esoteric build config very well as a general rule of thumb. Building with all the different flags would be very resource intensive so we have shied away from testing them.

Regarding 2, my personal view is that COW is an optimization, and that as a matter of general principal we should able to disable an optimizations and still function correctly. Having said that I am pretty sure that PERL_NO_COW is implemented in such a way that the code is divergent from the COW case such that we have to maintain two separate implementations. Ideally PERL_NO_COW would be "copy buffer always" instead of the "copy buffer when refcount=254" (i am pretty sure 255 is used for something special), but I suspect it is more like "don't do COW stuff at all". I think it probably makes sense to rework PERL_NO_COW so the code doesn't diverge but that we still have a "never COW" mode of operation.

I will reach out to you privately to see if I can help you sort this out.

leonerd commented 2 weeks ago

There's an underlying assumption behind this discussion that PERL_NO_COW is even a thing we will want to continue making work. I'm not 100% sure I'm convinced. COW strings are "just a thing" that Perl has - much like overload, tie, or any other thing. It was optional while it was new, but it has been around a long time and seems stable enough. We should be able to rely on it for other things.

It gets very awkward if we were to remain down the path of making all features optional for all of time - as the number of optional features increases over time, the number of possible combinations increases exponentially and it becomes impossible to test all of them. It could now be the case that we declare COW strings a successful experiment and simply say they're always present now, thus removing the configuration option.

demerphq commented 1 week ago

@leonerd I dont think you can put an optimization, especially a kinda hacky optimization like COW in the same categories as other features like tie or overload, those are language level features that are visible to a programmer, COW is not, its presence or absence should be completely invisible to a user except for the performance impact of having it. IMO optimizations should be disablable, if only for testing and bug detection. As I said in my previous comment, I think it is probably quite reasonable to rework things so we don't have divergant code-paths, but we should also have a mode where we set the refcount at which a buffer is copied to be low enough that the buffer is copied always.

COW isnt a /feature/ of perl, it is an implementation detail of how we manage certain practical problems. There are other options and other approaches (especially at the implementation level), and we should be free to change the code to use a different strategy without breaking everything. If anything is expecting COW for anything other than performance reasons then that is a barrier to improving our current solutions to these problems.

The current implementation arguably has issues. It does not work consistently, it does not play nicely with certain forms of XS string initialization, it puts the refcount at the end of the buffer where it causes cache line spoilation for nothing, and it has a practical limit of 254 copies per buffer. So arguably it should at some point be improved or changed. Ensuring we can build with it disabled is about the best way to ensure we dont end up on dependencies on the implementation.

I agree with you that not every /feature/ should have a way to disable it. But i do think that pretty much every /optimization/ should have a way to disable it and still get correct behavior.

Leont commented 1 week ago

I agree with you that not every /feature/ should have a way to disable it. But i do think that pretty much every /optimization/ should have a way to disable it and still get correct behavior.

Agreed. That's the only way to be sure it's good or not.

bulk88 commented 1 week ago

Regarding 2, my personal view is that COW is an optimization, and that as a matter of general principal we should able to disable an optimizations and still function correctly. Having said that I am pretty sure that PERL_NO_COW is implemented in such a way that the code is divergent from the COW case such that we have to maintain two separate implementations. Ideally PERL_NO_COW would be "copy buffer always" instead of the "copy buffer when refcount=254" (i am pretty sure 255 is used for something special), but I suspect it is more like "don't do COW stuff at all". I think it probably makes sense to rework PERL_NO_COW so the code doesn't diverge but that we still have a "never COW" mode of operation.

@nielsl s problem sounds like his C funcs are doing writing buffer overruns on SvPVX and overwriting with "garbage" the CowREFCNT(sv) which is the last malloc allocated byte in the buffer, which CowREFCNT is directly after invisible C str terminating NUL, or CowREFCNT is some uninit/unused bytes further down the buffer, after the terminating NULL byte.

Also remember about SINCE FOREVER (20 yrs) there was a tiny bit of faux-COW, in the sorta rareish on PP level SVPVs with share_hek_hek/SvIsCOW_shared_hash/unshare_hek PV buffers which are per interp/thread "globals".

static void* get_ptr( SV* obj ) { return SvPVX( obj ); }

unsigned int interleave_sims_C( float maxdif,
SV* old_ndcs_sv, SV* old_sims_sv, unsigned int old_len,
SV* add_ndcs_sv, SV* add_sims_sv, unsigned int add_len,
SV* new_ndcs_sv, SV* new_sims_sv )
{
unsigned int* old_ndcs = get_ptr( old_ndcs_sv )
static void* get_ptr( SV* obj ) { return SvPVX( obj ); }

should really be

static void* get_ptr( SV* obj ) {
STRLEN len;
void * ptr =(void*) SvPV_force(obj, len);
//var len is now SvCUR()'s val
if(len != 4 /*8 16 whatever*/*) {
  Perl_croak_nocontext("my lib err: wrong input len");
}
return ptr;
}

or atleast not memory corruption to Perl, but maybe corrupt/uninit input to your 3rd party C lib/funcs.

static void* get_ptr( SV* obj ) {
return (void*)SvGROW(obj, 4 /*8 16 whatever*/);
}

I can't tell from "SV* old_sims_sv, unsigned int old_len, SV* add_ndcs_sv, SV* add_sims_sv, unsigned int add_len," what args are INPUT, I/O, or OUTPUT from your C lib to perl or other way around. Your just not bounds checking the SVs for either your input or output from your C libs.

BTW, keeping "unsigned int* old_ndcs = get_ptr( old_ndcs_sv );" PP level "packed" in a 4/8 byte SVPV string is a strange/rare design choice vs using SvIVs/SvUV or SvNVs. But sounds like your "float* old_sims" pretty much mandates a 4 byte SvPV string.since that SvPV is keeping what originally was a 4 byte float, and casting/stooring that 4byte float to a SvNV/double/8 byte FP num, will cause endless C compiler rounding/truncate/-O0_1_2 flag bugs at runtime.

Another thing, if your doing in PP before calling the XSUB.

sub {
    my $packedvec = "\x00\x00\x00\x00\x00\x00\x00\x00";
    some_package::xs_sub($packedvec);
}

smells risky but I think in prior years this always worked, but with COW strings (im not sure I havent C stepped newer perls), $packedvec could be a shared some-type-COW PV buffer, and DEF not a "malloc" buffer for your purpose.

On 2nd thought, your XS code problems start with (void *)SvPVX(sv) and specifically the (void*), since your totally bypassing C's and XS's type safety system thru that cast. "float" is almost always 4 bytes, double 8 bytes, and int is very platform specific/random. your "float " MUST be pointing to a VALID, not uninit data, exactly 4 bytes/chars "Perl SV string", "int " probably to exactly a 8 byte string. void * get_ptr() needs to be REMOVED completly and replaced with int * get_sv_int_buf(SV* obj), float * get_sv_flt_buf(SV* obj), double * get_sv_dbl_buf(SV* obj) etc. Also figure out API design wise if you should be passing SvRV/ref to scalar, from pure perl, to your XSUB vs passing your current passing SvPVs if your doing output through @_ rn, instead of through more perl-ish multiple SVs in list context return/perl stack return, from your XSUB back to pure perl