Perl / perl5

🐪 The Perl programming language
https://dev.perl.org/perl5/
Other
1.85k stars 527 forks source link

Bleadperl v5.17.3-255-g6502e08 breaks SARTAK/Path-Dispatcher-1.04.tar.gz #12395

Closed p5pRT closed 11 years ago

p5pRT commented 11 years ago

Migrated from rt.perl.org#114820 (status was 'resolved')

Searchable as RT114820$

p5pRT commented 11 years ago

From @andk

git bisect


commit 6502e08109cd003b2cdf39bc94ef35e52203240b Author​: David Mitchell \davem@​iabyn\.com Date​: Thu Jul 26 16​:04​:09 2012 +0100

  Don't copy all of the match string buffer

diagnostics


Note​: Any​::Moose was picking Mouse 1.02 here because current Moose 2.0603 is broken with current bleadperl (see #114628)

[...] Can't call method "parent" on an undefined value at t/031-structured-match.t line 26. t/031-structured-match.t ....... Dubious\, test returned 255 (wstat 65280\, 0xff00) No subtests run [...] t/031-structured-match.t (Wstat​: 65280 Tests​: 0 Failed​: 0)   Non-zero exit status​: 255   Parse errors​: No plan found in TAP output [...]

perl -V


-- andreas

p5pRT commented 11 years ago

From @iabyn

On Sun\, Sep 09\, 2012 at 07​:31​:26PM -0700\, Andreas J. Koenig via RT wrote​:

---------- commit 6502e08109cd003b2cdf39bc94ef35e52203240b Author​: David Mitchell \davem@​iabyn\.com Date​: Thu Jul 26 16​:04​:09 2012 +0100

Don't copy all of the match string buffer

[snip]

Can't call method "parent" on an undefined value at t/031-structured-match.t line 26. t/031-structured-match.t ....... Dubious\, test returned 255 (wstat 65280\, 0xff00)

This is due to this line in Path/Dispatcher/Rule/Regex.pm​:

  $extra{leftover} = eval q{$'};

which expects to be able to execute a regex where $' hasn't been seen yet\, then afterwards to be able to retrieve its value. This used to happen to work\, but can't be relied upon (and no longer works).

-- Art is anything that has a label (especially if the label is "untitled 1")

p5pRT commented 11 years ago

The RT System itself - Status changed from 'new' to 'open'

p5pRT commented 11 years ago

From @cpansprout

On Mon Sep 10 08​:29​:17 2012\, davem wrote​:

On Sun\, Sep 09\, 2012 at 07​:31​:26PM -0700\, Andreas J. Koenig via RT wrote​:

---------- commit 6502e08109cd003b2cdf39bc94ef35e52203240b Author​: David Mitchell \davem@​iabyn\.com Date​: Thu Jul 26 16​:04​:09 2012 +0100

Don't copy all of the match string buffer

[snip]

Can't call method "parent" on an undefined value at t/031- structured-match.t line 26. t/031-structured-match.t ....... Dubious\, test returned 255 (wstat 65280\, 0xff00)

This is due to this line in Path/Dispatcher/Rule/Regex.pm​:

$extra\{leftover\} = eval q\{$'\};

which expects to be able to execute a regex where $' hasn't been seen yet\, then afterwards to be able to retrieve its value. This used to happen to work\, but can't be relied upon (and no longer works).

I have long considered it a bug that eval '$&' doesn’t work if the three forbidden match vars have not been seen.

That you have extended the bug further worries me.

I know you spent a lot of time on this\, but it sort of dashes any hopes of getting that fixed. :-)

An idea I had\, which I started to explore a year ago\, but never quite finished\, was to force a COW ‘copy’ and\, following the example of PERL_OLD_COPY_ON_WRITE\, store an SV in the regexp struct. (I never got it to work completely\, and\, code freeze fast approaching\, I decided to shelve it for a short while\, which turned into a long while.)

That in itself might speed up the cases you sped up differently\, if we assume that strings are rarely modified right after a match.

See \https://rt-archive.perl.org/perl5/Ticket/Display.html?id=37038#txn-1060310.

--

Father Chrysostomos

p5pRT commented 11 years ago

From p5p@sartak.org

Dave Mitchell wrote​:

On Sun\, Sep 09\, 2012 at 07​:31​:26PM -0700\, Andreas J. Koenig via RT wrote​:

---------- commit 6502e08109cd003b2cdf39bc94ef35e52203240b Author​: David Mitchell\davem@​iabyn\.com Date​: Thu Jul 26 16​:04​:09 2012 +0100

 Don't copy all of the match string buffer

[snip]

Can't call method "parent" on an undefined value at t/031-structured-match.t line 26. t/031-structured-match.t ....... Dubious\, test returned 255 (wstat 65280\, 0xff00)

This is due to this line in Path/Dispatcher/Rule/Regex.pm​:

 $extra\{leftover\} = eval q\{$'\};

which expects to be able to execute a regex where $' hasn't been seen yet\, then afterwards to be able to retrieve its value. This used to happen to work\, but can't be relied upon (and no longer works).

Hi\,

Author of the module in question here.

This line of code is\, of course\, simply an optimization. I have no qualms switching that to this less sneaky implementation​:

  $extra{leftover} = $';

Unfortunately grep.cpan.me seems to be down\, so I can't investigate how common a pattern eval q{$'} etc are.

Shawn

p5pRT commented 11 years ago

From @iabyn

On Mon\, Sep 10\, 2012 at 02​:06​:53PM -0400\, Shawn M Moore wrote​:

Author of the module in question here.

This line of code is\, of course\, simply an optimization. I have no qualms switching that to this less sneaky implementation​:

$extra\{leftover\} = $';

Rather than polluting the whole program with $'\, wouldn't it be better to just modify specific regexps to have '(.*)$' at the end?

Unfortunately grep.cpan.me seems to be down\, so I can't investigate how common a pattern eval q{$'} etc are.

Hopefully not too many :-(

-- I've often wanted to drown my troubles\, but I can't get my wife to go swimming.

p5pRT commented 11 years ago

From @doy

On Mon\, Sep 10\, 2012 at 08​:52​:14PM +0100\, Dave Mitchell wrote​:

On Mon\, Sep 10\, 2012 at 02​:06​:53PM -0400\, Shawn M Moore wrote​:

Author of the module in question here.

This line of code is\, of course\, simply an optimization. I have no qualms switching that to this less sneaky implementation​:

$extra\{leftover\} = $';

Rather than polluting the whole program with $'\, wouldn't it be better to just modify specific regexps to have '(.*)$' at the end?

It's matching against user-specified regexes\, so no. On the other hand\, using $' directly for 5.8 and switching to use /p and ${^POSTMATCH} for 5.10+ would be doable I think?

-doy

p5pRT commented 11 years ago

From @iabyn

On Mon\, Sep 10\, 2012 at 09​:15​:06AM -0700\, Father Chrysostomos via RT wrote​:

An idea I had\, which I started to explore a year ago\, but never quite finished\, was to force a COW ‘copy’ and\, following the example of PERL_OLD_COPY_ON_WRITE\, store an SV in the regexp struct. (I never got it to work completely\, and\, code freeze fast approaching\, I decided to shelve it for a short while\, which turned into a long while.)

But I thought COW had too many edge cases to be viable?

-- You're only as old as you look.

p5pRT commented 11 years ago

From dennis.kaarsemaker@booking.com

On ma\, 2012-09-10 at 14​:06 -0400\, Shawn M Moore wrote​:

Unfortunately grep.cpan.me seems to be down\, so I can't investigate how common a pattern eval q{$'} etc are.

Unique\, it seems​:

[dkaarsemaker@​llama ~]$ csearch "q{\\s*\\\$'\\s*}" 2>/dev/null /scratch/cpan/Benchmark-Perl-Formance-Cargo/share/PerlCritic/Critic/Policy/Variables/ProhibitPunctuationVars.pm​: ( q{$'}\, q{$$}\, q{$#}\, q{$​:}\, ); ## no critic ( RequireInterpolationOfMetachars\, ProhibitQuotedWordLists ) /scratch/cpan/Path-Dispatcher/lib/Path/Dispatcher/Rule/Regex.pm​: $extra{leftover} = eval q{$'}; /scratch/cpan/Perl-Critic/lib/Perl/Critic/Policy/Variables/ProhibitPunctuationVars.pm​: ( q{$'}\, q{$$}\, q{$#}\, q{$​:}\, ); ## no critic ( RequireInterpolationOfMetachars\, ProhibitQuotedWordLists )

-- Dennis Kaarsemaker\, Systems Architect Booking.com Herengracht 597\, 1017 CE Amsterdam Tel external +31 (0) 20 715 3409 Tel internal (7207) 3409

p5pRT commented 11 years ago

From @iabyn

On Mon\, Sep 10\, 2012 at 02​:56​:44PM -0500\, Jesse Luehrs wrote​:

On Mon\, Sep 10\, 2012 at 08​:52​:14PM +0100\, Dave Mitchell wrote​:

On Mon\, Sep 10\, 2012 at 02​:06​:53PM -0400\, Shawn M Moore wrote​:

Author of the module in question here.

This line of code is\, of course\, simply an optimization. I have no qualms switching that to this less sneaky implementation​:

$extra\{leftover\} = $';

Rather than polluting the whole program with $'\, wouldn't it be better to just modify specific regexps to have '(.*)$' at the end?

It's matching against user-specified regexes\, so no.

Why not $user_regex = qr/$user_regex(.*)$/;

On the other hand\, using $' directly for 5.8 and switching to use /p and ${^POSTMATCH} for 5.10+ would be doable I think?

That would be better than nothing.

-- All wight. I will give you one more chance. This time\, I want to hear no Wubens. No Weginalds. No Wudolf the wed-nosed weindeers.   -- Life of Brian

p5pRT commented 11 years ago

From @cpansprout

On Mon Sep 10 13​:02​:21 2012\, davem wrote​:

On Mon\, Sep 10\, 2012 at 09​:15​:06AM -0700\, Father Chrysostomos via RT wrote​:

An idea I had\, which I started to explore a year ago\, but never quite finished\, was to force a COW ‘copy’ and\, following the example of PERL_OLD_COPY_ON_WRITE\, store an SV in the regexp struct. (I never got it to work completely\, and\, code freeze fast approaching\, I decided to shelve it for a short while\, which turned into a long while.)

But I thought COW had too many edge cases to be viable?

I don’t know what you are referring to exactly.

I think it is perfectly viable. If we revive the PERL_OLD_COPY_ON_WRITE scheme\, we can​:

• Upgrade any scalar to PVIV\, or even discard a cached integer\, if the string is long enough that benchmarks show the copying to be slower than the bookkeeping overhead (or use SvLEN instead for storing the pointer); • Prefer swipe over cow whenever possible\, since we know it’s faster.

On Aug 3\, 2012\, at 8​:14 AM\, Nicholas Clark wrote​:

So I gave up on the whole thing\, because even though it seemed about the most lightweight approach possible\, I still couldn't get it to the point where it was measurably faster than not doing it

I think the biggest gain would be with large strings. If we do simple benchmarks to find the threshold where the copy outweighs the bookkeeping overhead\, we can make this work well.

--

Father Chrysostomos

p5pRT commented 11 years ago

From @cpansprout

On Mon Sep 10 15​:52​:14 2012\, sprout wrote​:

On Aug 3\, 2012\, at 8​:14 AM\, Nicholas Clark wrote​:

So I gave up on the whole thing\, because even though it seemed about the most lightweight approach possible\, I still couldn't get it to the point where it was measurably faster than not doing it

I think the biggest gain would be with large strings. If we do simple benchmarks to find the threshold where the copy outweighs the bookkeeping overhead\, we can make this work well.

And I was going to say​: This may not speed up most code in general\, but where it really matters (large strings) it will.

--

Father Chrysostomos

p5pRT commented 11 years ago

From @iabyn

On Mon\, Sep 10\, 2012 at 03​:52​:15PM -0700\, Father Chrysostomos via RT wrote​:

On Mon Sep 10 13​:02​:21 2012\, davem wrote​:

On Mon\, Sep 10\, 2012 at 09​:15​:06AM -0700\, Father Chrysostomos via RT wrote​:

An idea I had\, which I started to explore a year ago\, but never quite finished\, was to force a COW ‘copy’ and\, following the example of PERL_OLD_COPY_ON_WRITE\, store an SV in the regexp struct. (I never got it to work completely\, and\, code freeze fast approaching\, I decided to shelve it for a short while\, which turned into a long while.)

But I thought COW had too many edge cases to be viable?

I don’t know what you are referring to exactly.

From Nicholas's recent description of his COW work​:

Message-ID​: \20120803151426\.GL9583@​plum\.flirble\.org Plus also one test still failed in Compress​::Zlib\, which I couldn't work out why\, and so I feared that there was some subtle assumption that I'd missed that might cause other CPAN code to break. (Possibly at runtime\, possibly in interestingly subtle ways). At some point Artur Bergman made the point that CPAN code would reasonably expect that if it copied something read only\, the copy would not be. But (obviously) a COW copy has to act read only. Which troubled me. (The idea only came later of using #ifdef PERL_CORE in the headers to avoid doing COW copies in XS code\, after I'd concluded that I couldn't get (general) COW to be fast enough.)

So I gave up on the whole thing\, because even though it seemed about the most lightweight approach possible\, I still couldn't get it to the point where it was measurably faster than not doing it\, and it added complexity to the (runtime) code\, and it looked like it had a risk of subtly breaking CPAN code. This was all quite frustrating.

Which I read as indicating there were subtle and risky issues.

-- The Enterprise successfully ferries an alien VIP from one place to another without serious incident.   -- Things That Never Happen in "Star Trek" #7

p5pRT commented 11 years ago

From @cpansprout

On Tue Sep 11 06​:04​:53 2012\, davem wrote​:

On Mon\, Sep 10\, 2012 at 03​:52​:15PM -0700\, Father Chrysostomos via RT wrote​:

On Mon Sep 10 13​:02​:21 2012\, davem wrote​:

On Mon\, Sep 10\, 2012 at 09​:15​:06AM -0700\, Father Chrysostomos via RT wrote​:

An idea I had\, which I started to explore a year ago\, but never quite finished\, was to force a COW ‘copy’ and\, following the example of PERL_OLD_COPY_ON_WRITE\, store an SV in the regexp struct. (I never got it to work completely\, and\, code freeze fast approaching\, I decided to shelve it for a short while\, which turned into a long while.)

But I thought COW had too many edge cases to be viable?

I don’t know what you are referring to exactly.

From Nicholas's recent description of his COW work​:

Message-ID​: \20120803151426\.GL9583@​plum\.flirble\.org Plus also one test still failed in Compress​::Zlib\, which I couldn't work out why\, and so I feared that there was some subtle assumption that I'd missed that might cause other CPAN code to break. (Possibly at runtime\, possibly in interestingly subtle ways). At some point Artur Bergman made the point that CPAN code would reasonably expect that if it copied something read only\, the copy would not be. But (obviously) a COW copy has to act read only. Which troubled me. (The idea only came later of using #ifdef PERL_CORE in the headers to avoid doing COW copies in XS code\, after I'd concluded that I couldn't get (general) COW to be fast enough.)

So I gave up on the whole thing\, because even though it seemed about the most lightweight approach possible\, I still couldn't get it to the point where it was measurably faster than not doing it\, and it added complexity to the (runtime) code\, and it looked like it had a risk of subtly breaking CPAN code. This was all quite frustrating.

Which I read as indicating there were subtle and risky issues.

I think this part reasonably solves that​:

(The idea only came later of using #ifdef PERL_CORE in the headers to avoid doing COW copies in XS code\, after I'd concluded that I couldn't get (general) COW to be fast enough.)

--

Father Chrysostomos

p5pRT commented 11 years ago

From @nwc10

On Mon\, Sep 10\, 2012 at 09​:15​:06AM -0700\, Father Chrysostomos via RT wrote​:

On Mon Sep 10 08​:29​:17 2012\, davem wrote​:

On Sun\, Sep 09\, 2012 at 07​:31​:26PM -0700\, Andreas J. Koenig via RT wrote​:

---------- commit 6502e08109cd003b2cdf39bc94ef35e52203240b Author​: David Mitchell \davem@​iabyn\.com Date​: Thu Jul 26 16​:04​:09 2012 +0100

Don't copy all of the match string buffer

[snip]

Can't call method "parent" on an undefined value at t/031- structured-match.t line 26. t/031-structured-match.t ....... Dubious\, test returned 255 (wstat 65280\, 0xff00)

This is due to this line in Path/Dispatcher/Rule/Regex.pm​:

$extra\{leftover\} = eval q\{$'\};

which expects to be able to execute a regex where $' hasn't been seen yet\, then afterwards to be able to retrieve its value. This used to happen to work\, but can't be relied upon (and no longer works).

I have long considered it a bug that eval '$&' doesn't work if the three forbidden match vars have not been seen.

Me too.

That you have extended the bug further worries me.

I know you spent a lot of time on this\, but it sort of dashes any hopes of getting that fixed. :-)

But I wasn't confident of this part.

After all\, I fixed C\ so that it could work on more than one string. Only for C\ to be disabled\, and then removed. And I didn't feel particularly bad about it.

On Tue\, Sep 11\, 2012 at 12​:41​:25PM -0700\, Father Chrysostomos via RT wrote​:

On Tue Sep 11 06​:04​:53 2012\, davem wrote​:

On Mon\, Sep 10\, 2012 at 03​:52​:15PM -0700\, Father Chrysostomos via RT wrote​:

So I gave up on the whole thing\, because even though it seemed about the most lightweight approach possible\, I still couldn't get it to the point where it was measurably faster than not doing it\, and it added complexity to the (runtime) code\, and it looked like it had a risk of subtly breaking CPAN code. This was all quite frustrating.

Which I read as indicating there were subtle and risky issues.

I think this part reasonably solves that​:

(The idea only came later of using #ifdef PERL_CORE in the headers to avoid doing COW copies in XS code\, after I'd concluded that I couldn't get (general) COW to be fast enough.)

It still doesn't solve the test failures in one of the Compress​::Zlib modules. (Didn't check thoroughly as to why\, but one still failed when I got PERL_OLD_COPY_ON_WRITE compiling again a couple of days ago)

In particular\, I remember the frustration that led to this comment in Perl_sv_gets​:

  /* XXX. If you make this PVIV\, then copy on write can copy scalars read   from \<>.   However\, perlbench says it's slower\, because the existing swipe code   is faster than copy on write.   Swings and roundabouts. */   SvUPGRADE(sv\, SVt_PV);

The swipe code seems to be very effective. However\, I don't totally like it because it relies on SvTEMP() meaning "I can steal the buffer". Which isn't always true.

But I had a thought about an alternative COW mechanism\, exploiting the OOK mechanism.

To avoid moving memory around when truncating strings this permits SvPVX() to point within the malloc()ed buffer\, storing an offset to the true buffer start. Historically\, this was done with SvIVX(). It occurred to me that using extra memory was wasteful\, as we had free memory in the buffer itself. So currently the scheme is

  SvPVX()   v   ...*Your string is here   ^   a flag byte

If the flag byte is non-zero\, then it's the offset in bytes back to the buffer start. If it's zero\, then the sizeof(STRLEN) bytes before give that offset. Thus it can store an offset from 1 to STRLEN.

I wondered if that scheme can change. It stays as​:

  SvPVX()   v   ...*Your string is here   ^   a flag byte

but the flag byte is split in two in some fashion. Possibly\, low 7 bits are a reference count for the allocated buffer. Top bit is 0 if there's no further offset to the start of the allocated buffer. ie

  SvPVX()   v   *Your string is here   ^   reference count   ^ start of allocated buffer

and is 1 if there's the rest of the OOK structure earlier​:

  SvPVX()   v   ...*Your string is here   ^   reference count   ^ OOK 1 byte offset\, and maybe a STRLEN before it.   ^ start of allocated buffer

I think only a small amount of core code would need to change to accommodate this. (SvOOK_offset() and Perl_sv_backoff()) But it would permit a very cheap way to allocate PVs such that they could be "copied" - the overhead would be just 1 byte. And it feels much cheaper to undo this COW than the old scheme's pointer loop chasing.

It might be cheap enough to use everywhere when creating strings\, and thus actually get rid of the swipe code.

I'm also wondering if it's complementary with the other COW approach. That one doesn't need to move the contents of an existing buffer (so better for long strings)\, but does need to upgrade the SV body to PVIV\, and does need to chase the pointer loop on release.

However\, I'm still wary of the Compress​::Zlib test failure\, as an indicator that I didn't manage to anticipate every problem.

Also\, I'm thinking that we ought to get rid of the (ab)use of SvREADONLY() to mean two things - "This is a readonly value" and "you can't write to the buffer that SvPVX() points to". It would be better to have two flags.

Nicholas Clark

p5pRT commented 11 years ago

From @cpansprout

On Mon Sep 17 14​:06​:19 2012\, nicholas wrote​:

On Tue\, Sep 11\, 2012 at 12​:41​:25PM -0700\, Father Chrysostomos via RT wrote​:

On Tue Sep 11 06​:04​:53 2012\, davem wrote​:

On Mon\, Sep 10\, 2012 at 03​:52​:15PM -0700\, Father Chrysostomos via RT wrote​:

So I gave up on the whole thing\, because even though it seemed about the most lightweight approach possible\, I still couldn't get it to the point where it was measurably faster than not doing it\, and it added complexity to the (runtime) code\, and it looked like it had a risk of subtly breaking CPAN code. This was all quite frustrating.

Which I read as indicating there were subtle and risky issues.

I think this part reasonably solves that​:

(The idea only came later of using #ifdef PERL_CORE in the headers to avoid doing COW copies in XS code\, after I'd concluded that I couldn't get (general) COW to be fast enough.)

It still doesn't solve the test failures in one of the Compress​::Zlib modules. (Didn't check thoroughly as to why\, but one still failed when I got PERL_OLD_COPY_ON_WRITE compiling again a couple of days ago)

In particular\, I remember the frustration that led to this comment in Perl_sv_gets​:

/\* XXX\. If you make this PVIV\, then copy on write can copy scalars

read from \<>. However\, perlbench says it's slower\, because the existing swipe code is faster than copy on write. Swings and roundabouts. */ SvUPGRADE(sv\, SVt_PV);

The swipe code seems to be very effective. However\, I don't totally like it because it relies on SvTEMP() meaning "I can steal the buffer". Which isn't always true.

Are you referring to the use of _nosteal here and there\, or are you unaware that the swipe code checks for a refcount of 1?

But I had a thought about an alternative COW mechanism\, exploiting the OOK mechanism.

...snip details...

That is just brilliant!

But it would permit a very cheap way to allocate PVs such that they could be "copied" - the overhead would be just 1 byte.

That’s a lot smaller than allocating a shared_he (20 bytes on a 32-bit machine)\, something I tried to do.

And it feels much cheaper to undo this COW than the old scheme's pointer loop chasing.

It might be cheap enough to use everywhere when creating strings\, and thus actually get rid of the swipe code.

I'm also wondering if it's complementary with the other COW approach. That one doesn't need to move the contents of an existing buffer

I may have misunderstood\, but it doesn’t sound as though your 1-byte flag method needs that either\, if that extra byte is allocated at the outset.

(so better for long strings)\, but does need to upgrade the SV body to PVIV\, and does need to chase the pointer loop on release.

However\, I'm still wary of the Compress​::Zlib test failure\, as an indicator that I didn't manage to anticipate every problem.

It is probably doing something naughty. :-)

Also\, I'm thinking that we ought to get rid of the (ab)use of SvREADONLY() to mean two things - "This is a readonly value" and "you can't write to the buffer that SvPVX() points to". It would be better to have two flags.

What about existing XS code that checks SvREADONLY before modifying SvPVX? (Is there any? Most XS code I’ve seen does a terrible job of handling perl’s types.)

The flag space is rather crowded. Maybe it’s about time we made pad names their own type. Their being SVs makes SVs themselves more complicated. A dedicated pad name type would also allow the name to be allocated as part of the same memory block\, much like a hek.

--

Father Chrysostomos

p5pRT commented 11 years ago

From @demerphq

On 17 September 2012 23​:05\, Nicholas Clark \nick@&#8203;ccl4\.org wrote​:

To avoid moving memory around when truncating strings this permits SvPVX() to point within the malloc()ed buffer\, storing an offset to the true buffer start. Historically\, this was done with SvIVX(). It occurred to me that using extra memory was wasteful\, as we had free memory in the buffer itself.

You say "historically this was done with SvIVX()" and then imply not using SvIVX() saves memory. Am I correct in understanding this means we can do the OOK hack without upgrading to a SvPVIV? We can leave the item as a SvPV?

Also a note for readers new to this subject​: "truncate" means "remove the start of the string" in this context.

Yves

-- perl -Mre=debug -e "/just|another|perl|hacker/"

p5pRT commented 11 years ago

From @nwc10

On Tue\, Sep 18\, 2012 at 11​:00​:08AM +0200\, demerphq wrote​:

On 17 September 2012 23​:05\, Nicholas Clark \nick@&#8203;ccl4\.org wrote​:

To avoid moving memory around when truncating strings this permits SvPVX() to point within the malloc()ed buffer\, storing an offset to the true buffer start. Historically\, this was done with SvIVX(). It occurred to me that using extra memory was wasteful\, as we had free memory in the buffer itself.

You say "historically this was done with SvIVX()" and then imply not using SvIVX() saves memory. Am I correct in understanding this means we can do the OOK hack without upgrading to a SvPVIV? We can leave the item as a SvPV?

We do do the OOK hack without upgrading\, as of (it seems) v5.12.0

$ perl5.10.0 -MDevel​::Peek -le '$_ = "half pint"; s/half //; Dump $_; print $]' SV = PVIV(0x100803c10) at 0x100820f08   REFCNT = 1   FLAGS = (POK\,OOK\,pPOK)   IV = 5 (OFFSET)   PV = 0x1002023d5 ( "half " . ) "pint"\0   CUR = 4   LEN = 11 5.010000 $ perl -MDevel​::Peek -le '$_ = "half pint"; s/half //; Dump $_; print $]' SV = PV(0x100801c78) at 0x10081ece8   REFCNT = 1   FLAGS = (POK\,OOK\,pPOK)   OFFSET = 5   PV = 0x100201e45 ( "half\5" . ) "pint"\0   CUR = 4   LEN = 11 5.012004

Contrast the IV = 5 (OFFSET) with the use of "\5" to store the offset.

Here's the change of storage format​:

$ perl -MDevel​::Peek -le '$_ = "N" x 256; s/.{255}//; Dump $_; print $]' SV = PV(0x100801d78) at 0x10081ece8   REFCNT = 1   FLAGS = (POK\,OOK\,pPOK)   OFFSET = 255   PV = 0x100213cbf ( "NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN\377" . ) "N"\0   CUR = 1   LEN = 17 5.012004 $ perl -MDevel​::Peek -le '$_ = "N" x 257; s/.{256}//; Dump $_; print $]' SV = PV(0x100801d78) at 0x10081ece8   REFCNT = 1   FLAGS = (POK\,OOK\,pPOK)   OFFSET = 256   PV = 0x100213cc0 ( "NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN\0\1\0\0\0\0\0\0\0" . ) "N"\0   CUR = 1   LEN = 16 5.012004

And this is the -DDEBUGGING version being paranoid that someone might overwrite the unused buffer​:

$ ./perl -Ilib -MDevel​::Peek -le '$_ = "N" x 257; s/.{256}//; Dump $_; print $]' SV = PV(0x100802e70) at 0x100825fc0   REFCNT = 1   FLAGS = (POK\,OOK\,pPOK)   OFFSET = 256   PV = 0x100608be0 ( "\340\341\342\343\344\345\346\347\350\351\352\353\354\355\356\357\360\361\362\363\364\365\366\367\370\371\372\373\374\375\376\377\0\1\2\3\4\5\6\7\10\t\n\v\f\r\16\17\20\21\22\23\24\25\26\27\30\31\32\33\34\35\36\37 !\"#$%&'()*+\,-./0123456789​:;\<=>?@​ABCDEFGHIJKLMNOPQRSTUVWXYZ[\\]^_`abcdefghijklmnopqrstuvwxyz{|}~\177\200\201\202\203\204\205\206\207\210\211\212\213\214\215\216\217\220\221\222\223\224\225\226\227\230\231\232\233\234\235\236\237\240\241\242\243\244\245\246\247\250\251\252\253\254\255\256\257\260\261\262\263\264\265\266\267\270\271\272\273\274\275\276\277\300\301\302\303\304\305\306\307\310\311\312\313\314\315\316\317\320\321\322\323\324\325\326\0\1\0\0\0\0\0\0\0" . ) "N"\0   CUR = 1   LEN = 16 5.017004

Nicholas Clark

p5pRT commented 11 years ago

From @iabyn

On Mon\, Sep 17\, 2012 at 10​:05​:34PM +0100\, Nicholas Clark wrote​:

On Mon\, Sep 10\, 2012 at 09​:15​:06AM -0700\, Father Chrysostomos via RT wrote​:

I know you spent a lot of time on this\, but it sort of dashes any hopes of getting that fixed. :-)

But I wasn't confident of this part.

After all\, I fixed C\ so that it could work on more than one string. Only for C\ to be disabled\, and then removed. And I didn't feel particularly bad about it.

I have no objection to my 'copy only a bit of the buffer' code being ripped out if it can be replaced by a reliable COW scheme :-)

But I had a thought about an alternative COW mechanism\, exploiting the OOK mechanism.

So this would involve any string about to be COWed to need to be Moved up 1 byte and possibly realloced?

One thing I don't yet understand about COW (either scheme) is how you detect all possible writes. What's to stop some XS code code doing something like​:

  if (SvPOK(sv) && SvCUR(sv))   (SvPVX(sv))[0] = 'x';

-- Britain\, Britain\, Britain! Discovered by Sir Henry Britain in sixteen-oh-ten. Sold to Germany a year later for a pfennig and the promise of a kiss. Destroyed in eighteen thirty-forty two\, and rebuilt a week later by a man. This we know. Hello. But what of the people of Britain? Who they? What do? And why? -- "Little Britain"

p5pRT commented 11 years ago

From @nwc10

On Tue\, Sep 18\, 2012 at 10​:38​:01AM +0100\, Dave Mitchell wrote​:

On Mon\, Sep 17\, 2012 at 10​:05​:34PM +0100\, Nicholas Clark wrote​:

On Mon\, Sep 10\, 2012 at 09​:15​:06AM -0700\, Father Chrysostomos via RT wrote​:

I know you spent a lot of time on this\, but it sort of dashes any hopes of getting that fixed. :-)

But I wasn't confident of this part.

After all\, I fixed C\ so that it could work on more than one string. Only for C\ to be disabled\, and then removed. And I didn't feel particularly bad about it.

I have no objection to my 'copy only a bit of the buffer' code being ripped out if it can be replaced by a reliable COW scheme :-)

But I had a thought about an alternative COW mechanism\, exploiting the OOK mechanism.

So this would involve any string about to be COWed to need to be Moved up 1 byte and possibly realloced?

Yes. Which was why I suggested that

a) one runs with both available b) new strings are allocatd 1 byte off   [forgot to say - on how many platforms does this make a performance hit?   On ARM\, at least\, there are hardcore optimised word-at-a-time routines   for word aligned strings\, and byte-by-byte for everything else] c) existing long strings are COW-ed by the "upgrade to PVIV" approach d) potentially things in the middle just get copied

One thing I don't yet understand about COW (either scheme) is how you detect all possible writes. What's to stop some XS code code doing something like​:

if \(SvPOK\(sv\) && SvCUR\(sv\)\)
\(SvPVX\(sv\)\)\[0\] = 'x';

You can't\, if they don't give an $expletive about the READONLY flag. That is their bug. But BBC does go here\, as does perlbug@​perl.org The subtle problem that I missed but Artur spotted was ensuring that API calls that "obviously" return a fresh clean new scalar (and hence writable) actually don't change to returning a COW copy of something.

Nicholas Clark

p5pRT commented 11 years ago

From @demerphq

On 18 September 2012 12​:00\, Nicholas Clark \nick@&#8203;ccl4\.org wrote​:

On Tue\, Sep 18\, 2012 at 10​:38​:01AM +0100\, Dave Mitchell wrote​:

On Mon\, Sep 17\, 2012 at 10​:05​:34PM +0100\, Nicholas Clark wrote​:

On Mon\, Sep 10\, 2012 at 09​:15​:06AM -0700\, Father Chrysostomos via RT wrote​:

I know you spent a lot of time on this\, but it sort of dashes any hopes of getting that fixed. :-)

But I wasn't confident of this part.

After all\, I fixed C\ so that it could work on more than one string. Only for C\ to be disabled\, and then removed. And I didn't feel particularly bad about it.

I have no objection to my 'copy only a bit of the buffer' code being ripped out if it can be replaced by a reliable COW scheme :-)

But I had a thought about an alternative COW mechanism\, exploiting the OOK mechanism.

So this would involve any string about to be COWed to need to be Moved up 1 byte and possibly realloced?

Yes. Which was why I suggested that

a) one runs with both available b) new strings are allocatd 1 byte off [forgot to say - on how many platforms does this make a performance hit? On ARM\, at least\, there are hardcore optimised word-at-a-time routines for word aligned strings\, and byte-by-byte for everything else] c) existing long strings are COW-ed by the "upgrade to PVIV" approach d) potentially things in the middle just get copied

One thing I don't yet understand about COW (either scheme) is how you detect all possible writes. What's to stop some XS code code doing something like​:

if \(SvPOK\(sv\) && SvCUR\(sv\)\)
  \(SvPVX\(sv\)\)\[0\] = 'x';

You can't\, if they don't give an $expletive about the READONLY flag. That is their bug. But BBC does go here\, as does perlbug@​perl.org The subtle problem that I missed but Artur spotted was ensuring that API calls that "obviously" return a fresh clean new scalar (and hence writable) actually don't change to returning a COW copy of something.

Maybe a SvPVXW macro\, which would check this and throw an error would be a useful addition to future proof code to this problem?

Yves

-- perl -Mre=debug -e "/just|another|perl|hacker/"

p5pRT commented 11 years ago

From @demerphq

On 18 September 2012 11​:11\, Nicholas Clark \nick@&#8203;ccl4\.org wrote​:

On Tue\, Sep 18\, 2012 at 11​:00​:08AM +0200\, demerphq wrote​:

On 17 September 2012 23​:05\, Nicholas Clark \nick@&#8203;ccl4\.org wrote​:

To avoid moving memory around when truncating strings this permits SvPVX() to point within the malloc()ed buffer\, storing an offset to the true buffer start. Historically\, this was done with SvIVX(). It occurred to me that using extra memory was wasteful\, as we had free memory in the buffer itself.

You say "historically this was done with SvIVX()" and then imply not using SvIVX() saves memory. Am I correct in understanding this means we can do the OOK hack without upgrading to a SvPVIV? We can leave the item as a SvPV?

We do do the OOK hack without upgrading\, as of (it seems) v5.12.0

That is very cool. Thanks a lot for explaining\, and I guess doing it.

-- perl -Mre=debug -e "/just|another|perl|hacker/"

p5pRT commented 11 years ago

From @salva

On 09/18/2012 11​:38 AM\, Dave Mitchell wrote​:

On Mon\, Sep 17\, 2012 at 10​:05​:34PM +0100\, Nicholas Clark wrote​:

On Mon\, Sep 10\, 2012 at 09​:15​:06AM -0700\, Father Chrysostomos via RT wrote​:

I know you spent a lot of time on this\, but it sort of dashes any hopes of getting that fixed. :-)

But I wasn't confident of this part.

After all\, I fixed C\ so that it could work on more than one string. Only for C\ to be disabled\, and then removed. And I didn't feel particularly bad about it.

I have no objection to my 'copy only a bit of the buffer' code being ripped out if it can be replaced by a reliable COW scheme :-)

But I had a thought about an alternative COW mechanism\, exploiting the OOK mechanism.

So this would involve any string about to be COWed to need to be Moved up 1 byte and possibly realloced?

There are XS modules out there that place C structures on the PV slot and rely on it being properly aligned.

p5pRT commented 11 years ago

From @nwc10

On Tue\, Sep 18\, 2012 at 12​:14​:45PM +0200\, Salvador Fandino wrote​:

On 09/18/2012 11​:38 AM\, Dave Mitchell wrote​:

On Mon\, Sep 17\, 2012 at 10​:05​:34PM +0100\, Nicholas Clark wrote​:

But I had a thought about an alternative COW mechanism\, exploiting the OOK mechanism.

So this would involve any string about to be COWed to need to be Moved up 1 byte and possibly realloced?

There are XS modules out there that place C structures on the PV slot and rely on it being properly aligned.

Good point. I think that there are XS modules in ext/ which rely on that :-)

So it would only be sane to do this for new PVs that are explicitly known to be strings.

Nicholas Clark

p5pRT commented 11 years ago

From @ap

* Nicholas Clark \nick@&#8203;ccl4\.org [2012-09-18 12​:05]​:

b) new strings are allocatd 1 byte off [forgot to say - on how many platforms does this make a performance hit? On ARM\, at least\, there are hardcore optimised word-at-a-time routines for word aligned strings\, and byte-by-byte for everything else]

Just allocate it another 1 (or even 3) bytes off on platforms where that is a concern?

p5pRT commented 11 years ago

From @Leont

On Tue\, Sep 18\, 2012 at 11​:38 AM\, Dave Mitchell \davem@&#8203;iabyn\.com wrote​:

One thing I don't yet understand about COW (either scheme) is how you detect all possible writes. What's to stop some XS code code doing something like​:

if \(SvPOK\(sv\) && SvCUR\(sv\)\)
    \(SvPVX\(sv\)\)\[0\] = 'x';

There is already SV_CHECK_THINKFIRST (and friends) to fix all such issues. That code is already wrong today\, but there's also already a solution.

Leon

p5pRT commented 11 years ago

From @nwc10

On Tue\, Sep 18\, 2012 at 01​:00​:58PM +0200\, Aristotle Pagaltzis wrote​:

* Nicholas Clark \nick@&#8203;ccl4\.org [2012-09-18 12​:05]​:

b) new strings are allocatd 1 byte off [forgot to say - on how many platforms does this make a performance hit? On ARM\, at least\, there are hardcore optimised word-at-a-time routines for word aligned strings\, and byte-by-byte for everything else]

Just allocate it another 1 (or even 3) bytes off on platforms where that is a concern?

I think that it would be 3 or 7. ["real" platforms have words that are 4 or 8 bytes. At least\, in my book\, they do :-)]

I'm not sure that that is as worth it\, given that

a) the seduction of the use of 1 byte is that it's a really low memory cost\,   which at least 25% (or 12.5%) of the time isn't going to result in the   use of a different malloc() buckets b) not sure how often perl really ends up doing full-on comparisons\, as   storing the length means that unlike C\, equal and not equal don't always   need to walk the full string

Would be the right sort of thing to benchmark to find out the costs/benefits\, given "typical program" benchmarks.

Nicholas Clark

p5pRT commented 11 years ago

From @cpansprout

On Mon Sep 17 21​:44​:53 2012\, sprout wrote​:

On Mon Sep 17 14​:06​:19 2012\, nicholas wrote​:

However\, I'm still wary of the Compress​::Zlib test failure\, as an indicator that I didn't manage to anticipate every problem.

It is probably doing something naughty. :-)

This is enough to get it working​:

Inline Patch ```diff diff --git a/cpan/Compress-Raw-Zlib/Zlib.xs b/cpan/Compress-Raw-Zlib/Zlib.xs index 9f1d7a1..acaff84 100644 --- a/cpan/Compress-Raw-Zlib/Zlib.xs +++ b/cpan/Compress-Raw-Zlib/Zlib.xs @@ -614,7 +614,7 @@ char * string ; croak("%s: buffer parameter is a reference to a reference", ```

string) ;   }

- if (SvREADONLY(sv) && PL_curcop != &PL_compiling) + if (SvREADONLY(sv) && !SvIsCOW(sv) && PL_curcop != &PL_compiling)   croak("%s​: buffer parameter is read-only"\, string);

  SvUPGRADE(sv\, SVt_PV); @​@​ -1334\,7 +1334\,7 @​@​ inflate (s\, buf\, output\, eof=FALSE)   /* If the buffer is a reference\, dereference it */   buf = deRef(buf\, "inflate") ;

- if (s->flags & FLAG_CONSUME_INPUT && SvREADONLY(buf)) + if (s->flags & FLAG_CONSUME_INPUT && SvREADONLY(buf) && !SvIsCOW(buf))   croak("Compress​::Raw​::Zlib​::Inflate​::inflate input parameter cannot be read-only when ConsumeInput is specified"); #ifdef UTF8_AVAILABLE
  if (DO_UTF8(buf) && !sv_utf8_downgrade(buf\, 1))

Also\, I'm thinking that we ought to get rid of the (ab)use of SvREADONLY() to mean two things - "This is a readonly value" and "you can't write to the buffer that SvPVX() points to". It would be better to have two flags.

What about existing XS code that checks SvREADONLY before modifying SvPVX? (Is there any? Most XS code I’ve seen does a terrible job of handling perl’s types.)

I did a quick audit and found numerous modules that use SvREADONLY. Only three of them use it correctly​:

File​::Map optimizer POSIX​::pselect

And it looks as though only POSIX​::pselect will break if we make SvIsCOW its own flag.

I suggest we go ahead and do it. Some modules mishandling SvREADONLY might start to modify COW scalars in place. But I think it really is worth it.

Any code that wants to modify strings in place should be using SvPV_force first anyway.

The flag space is rather crowded. Maybe it’s about time we made pad names their own type. Their being SVs makes SVs themselves more complicated. A dedicated pad name type would also allow the name to be allocated as part of the same memory block\, much like a hek.

We could use Dave Mitchell’s free bit for now\, which in \20111007191900\.GF3046@&#8203;iabyn\.com he asked us not to squander.

Presumably SvREADONLY should also set the flag. Presumably we would call it something like SvREADONLYPV or SvPVisREADONLY.

--

Father Chrysostomos

p5pRT commented 11 years ago

From @bulk88

On Mon Sep 17 21​:44​:53 2012\, sprout wrote​:

The flag space is rather crowded. Maybe it’s about time we made pad names their own type. Their being SVs makes SVs themselves more complicated. A dedicated pad name type would also allow the name to be allocated as part of the same memory block\, much like a hek.

There is the existing SVf_FAKE and SVf_READONLY for HEKs. Maybe I didn't see it in this thread\, but what is wrong with using shared HEKs for COW?

Since the proposal as I read is it to introduce 2 different "start of buffer" formats for the offset system\, is a flag needed? Why not just use SVf_OOK and let the macros and functions figure out if it is a COW or offset in the start of string data?

In XS code I've used SVf_READONLY to protect PV buffers given to a OS thread pooled C library for asynchronous reading or writing during an asynchronous transaction. Specifically the PV must not be realloced (SVf_READONLY) or freeded (SV refcount) during the transaction otherwise a thread in the pool will SEGV. Double copying to private buffers caused performance problems since the result of each transaction were 20KB-500KB binary strings. After the transaction is completed\, SVf_READONLY is turned off\, sv_chop is used to cut off the C library's opaque transaction header struct from the PV since it is binary now expired garbage to the caller\, and then the refcount is decreased on the SVPV. The caller is supposed to keep a reference to the SVPV somewhere in his Perl code.

Another thing\, I've heard Perl allows non-Perl allocated PVs\, see http​://www.nntp.perl.org/group/perl.perl5.porters/2011/12/msg180269.html . This shouldn't be broken by a future COW system.

p5pRT commented 11 years ago

From @cpansprout

On Tue Oct 02 12​:06​:42 2012\, bulk88 wrote​:

On Mon Sep 17 21​:44​:53 2012\, sprout wrote​:

The flag space is rather crowded. Maybe it’s about time we made pad names their own type. Their being SVs makes SVs themselves more complicated. A dedicated pad name type would also allow the name to be allocated as part of the same memory block\, much like a hek.

There is the existing SVf_FAKE and SVf_READONLY for HEKs. Maybe I didn't see it in this thread\, but what is wrong with using shared HEKs for COW?

We already do when we have a HEK already. But taking an existing string and creating a HEK out of it requires copying it. I tried an experiment to allocate all strings as HEKs initially\, and then upgrade them to shared HEKs by adding them to the string table when they would have been copied\, but the shared string table overhead outweighs any savings.

What we are looking for is a cheap way to do COW with existing strings.

Since the proposal as I read is it to introduce 2 different "start of buffer" formats for the offset system\, is a flag needed? Why not just use SVf_OOK and let the macros and functions figure out if it is a COW or offset in the start of string data?

The new flag would apply also to SVs containing shared HEKs.

By using a new flag instead of SvREADONLY\, we can make the dozen or so modules on CPAN that do if (SvREADONLY(sv)) croak(...) start working correctly.

But no\, the new flag would not be strictly necessary; just convenient.

In XS code I've used SVf_READONLY to protect PV buffers given to a OS thread pooled C library for asynchronous reading or writing during an asynchronous transaction. Specifically the PV must not be realloced (SVf_READONLY) or freeded (SV refcount) during the transaction otherwise a thread in the pool will SEGV. Double copying to private buffers caused performance problems since the result of each transaction were 20KB-500KB binary strings. After the transaction is completed\, SVf_READONLY is turned off\, sv_chop is used to cut off the C library's opaque transaction header struct from the PV since it is binary now expired garbage to the caller\, and then the refcount is decreased on the SVPV. The caller is supposed to keep a reference to the SVPV somewhere in his Perl code.

Another thing\, I've heard Perl allows non-Perl allocated PVs\, see http​://www.nntp.perl.org/group/perl.perl5.porters/2011/12/msg180269.html . This shouldn't be broken by a future COW system.

I don’t think any of this would be broken.

Basically\, as long as any XS code that wants to modify a string does SvPV_force(sv) first (as it should be doing already)\, everything should work.

I can image that newSVpvn might need to behave differently depending on whether it is called by core code.

When XS modules use the PV pointer to store pointer-aligned binary data\, do they get perl to allocate the PV to begin with? Or do they allocate it themselves? Does anyone know which modules these are? I cannot think of a way to search for them.

I have just had another idea\, a variation of the SvOOK hack\, which may render those questions moot. We could put the reference count on the end of the string\, after the null\, rather than at the beginning. If there is no room there when we are trying to fake a copy\, then we copy the string for real\, making the new buffer one byte bigger.

The chances of missing the opportunity for COW are 1 in 16\, on 32-bit darwin​:

#include \<stdlib.h> #include \<stdio.h> main() { int i; size_t prev = 0; for(i = 1; i\<=256; i++) {   void *ptr = malloc(i);   size_t size = malloc_size(ptr);   if (size != prev) printf ("Size for %d is %d\n"\, i\, (int)size);   prev = size;   free(ptr); } }

Output​:

Size for 1 is 16 Size for 17 is 32 Size for 33 is 48 Size for 49 is 64 Size for 65 is 80 Size for 81 is 96 Size for 97 is 112 Size for 113 is 128 Size for 129 is 144 Size for 145 is 160 Size for 161 is 176 Size for 177 is 192 Size for 193 is 208 Size for 209 is 224 Size for 225 is 240 Size for 241 is 256

It is probably worthwhile to go ahead and allocate the extra byte anyway at the outset (only 1/16th of the time will we use more memory than needed\, but we will be trading memory for speed)\, but we will still have to check for it and fall back to copying\, because XS modules may create SVs without the extra byte.

--

Father Chrysostomos

p5pRT commented 11 years ago

From @bulk88

On Tue Oct 02 13​:02​:26 2012\, sprout wrote​:

Basically\, as long as any XS code that wants to modify a string does SvPV_force(sv) first (as it should be doing already)\, everything should work.

I can image that newSVpvn might need to behave differently depending on whether it is called by core code.

When XS modules use the PV pointer to store pointer-aligned binary data\, do they get perl to allocate the PV to begin with? Or do they allocate it themselves? Does anyone know which modules these are? I cannot think of a way to search for them.

I don't use SvPV_force since I dont need to convert the data in the SV from non PV to PV if I am going immediately overwrite the PV's contents. I attached the actual alignment code I use. The XS module only runs on Windows so the XS code is Windows specific. I cut out the parts that weren't relevant. Some parts like no magic check and vivifying elements are part of the prevent reallocs by Perl during the async operation code. The code allows alignment to non-powers of 2\, even though I've never seen a driver that wants non-powers of 2 alignment before. If such a driver is ever found\, the code is ready for it. The read call to the driver (not shown) will fail if the driver decides the buffer is unaligned\, so it is upto the user to choose what uAlign should be. The fail from C level read (actually Win32 ReadFileEx) will propagate to an undef in the Perl script. This XSUB works on any Windows driver that implements "read"\, so the alignment can't be hard coded. Some drivers don't care about alignment\, some want 256\, or 512 or 4096.

p5pRT commented 11 years ago

From @bulk88

align.xs

p5pRT commented 11 years ago

From @cpansprout

On Tue Oct 02 15​:37​:56 2012\, bulk88 wrote​:

On Tue Oct 02 13​:02​:26 2012\, sprout wrote​:

Basically\, as long as any XS code that wants to modify a string does SvPV_force(sv) first (as it should be doing already)\, everything should work.

I can image that newSVpvn might need to behave differently depending on whether it is called by core code.

When XS modules use the PV pointer to store pointer-aligned binary data\, do they get perl to allocate the PV to begin with? Or do they allocate it themselves? Does anyone know which modules these are? I cannot think of a way to search for them.

I don't use SvPV_force since I dont need to convert the data in the SV from non PV to PV if I am going immediately overwrite the PV's contents.

I see you are using sv_force_normal\, which also works.

I attached the actual alignment code I use.

Thank you. That is helpful. I think this will continue to work fine if we put the COW reference count after the trailing null.

--

Father Chrysostomos

p5pRT commented 11 years ago

From @cpansprout

On Tue Oct 02 13​:02​:26 2012\, sprout wrote​:

On Tue Oct 02 12​:06​:42 2012\, bulk88 wrote​:

On Mon Sep 17 21​:44​:53 2012\, sprout wrote​:

The flag space is rather crowded. Maybe it’s about time we made pad names their own type. Their being SVs makes SVs themselves more complicated. A dedicated pad name type would also allow the name to be allocated as part of the same memory block\, much like a hek.

There is the existing SVf_FAKE and SVf_READONLY for HEKs. Maybe I didn't see it in this thread\, but what is wrong with using shared HEKs for COW?

We already do when we have a HEK already. But taking an existing string and creating a HEK out of it requires copying it. I tried an experiment to allocate all strings as HEKs initially\, and then upgrade them to shared HEKs by adding them to the string table when they would have been copied\, but the shared string table overhead outweighs any savings.

What we are looking for is a cheap way to do COW with existing strings.

Since the proposal as I read is it to introduce 2 different "start of buffer" formats for the offset system\, is a flag needed? Why not just use SVf_OOK and let the macros and functions figure out if it is a COW or offset in the start of string data?

The new flag would apply also to SVs containing shared HEKs.

By using a new flag instead of SvREADONLY\, we can make the dozen or so modules on CPAN that do if (SvREADONLY(sv)) croak(...) start working correctly.

But no\, the new flag would not be strictly necessary; just convenient.

Here is another reason​:

COW currently does not work on read-only scalars. If we add a COW flag\, then copying a read-only string will entail turning on the COW flag and increasing the reference count at SvEND+1.

But is it OK to modify the string buffer of a read-only scalar? What does read-only mean? That the contents of the scalar must not be modified\, or that modifications should have no observable effects at the Perl language level?

--

Father Chrysostomos

p5pRT commented 11 years ago

From @salva

On 10/02/2012 10​:02 PM\, Father Chrysostomos via RT wrote​:

On Tue Oct 02 12​:06​:42 2012\, bulk88 wrote​:

On Mon Sep 17 21​:44​:53 2012\, sprout wrote​:

The flag space is rather crowded. Maybe it’s about time we made pad names their own type. Their being SVs makes SVs themselves more complicated. A dedicated pad name type would also allow the name to be allocated as part of the same memory block\, much like a hek.

There is the existing SVf_FAKE and SVf_READONLY for HEKs. Maybe I didn't see it in this thread\, but what is wrong with using shared HEKs for COW?

We already do when we have a HEK already. But taking an existing string and creating a HEK out of it requires copying it. I tried an experiment to allocate all strings as HEKs initially\, and then upgrade them to shared HEKs by adding them to the string table when they would have been copied\, but the shared string table overhead outweighs any savings.

What we are looking for is a cheap way to do COW with existing strings.

Since the proposal as I read is it to introduce 2 different "start of buffer" formats for the offset system\, is a flag needed? Why not just use SVf_OOK and let the macros and functions figure out if it is a COW or offset in the start of string data?

The new flag would apply also to SVs containing shared HEKs.

By using a new flag instead of SvREADONLY\, we can make the dozen or so modules on CPAN that do if (SvREADONLY(sv)) croak(...) start working correctly.

But no\, the new flag would not be strictly necessary; just convenient.

In XS code I've used SVf_READONLY to protect PV buffers given to a OS thread pooled C library for asynchronous reading or writing during an asynchronous transaction. Specifically the PV must not be realloced (SVf_READONLY) or freeded (SV refcount) during the transaction otherwise a thread in the pool will SEGV. Double copying to private buffers caused performance problems since the result of each transaction were 20KB-500KB binary strings. After the transaction is completed\, SVf_READONLY is turned off\, sv_chop is used to cut off the C library's opaque transaction header struct from the PV since it is binary now expired garbage to the caller\, and then the refcount is decreased on the SVPV. The caller is supposed to keep a reference to the SVPV somewhere in his Perl code.

Another thing\, I've heard Perl allows non-Perl allocated PVs\, see http​://www.nntp.perl.org/group/perl.perl5.porters/2011/12/msg180269.html . This shouldn't be broken by a future COW system.

I don’t think any of this would be broken.

Basically\, as long as any XS code that wants to modify a string does SvPV_force(sv) first (as it should be doing already)\, everything should work.

My understanding of SvPV_force has always been that its purpose is to discard any values cached on the IV and NV slots. This is a common scenario but not the only possible one. For instance\, on dualvars as $! the IV and PV are not keep synchronized.

An XS module author may use the IV\, NV and PV slots to store unrelated values and so not calling SvPV_force before updating the value on the PV slot.

I can image that newSVpvn might need to behave differently depending on whether it is called by core code.

IMO it would be better to use a different function name. Having functions that show different behavior when using from some place or another can be quite confusing.

I for myself use perl core code as a reference frequently\, sometimes even copy&paste code from there in my XS modules and I expect it to work in the same way. XS programming is already difficult enough\, don't make it harder!

When XS modules use the PV pointer to store pointer-aligned binary data\, do they get perl to allocate the PV to begin with? Or do they allocate it themselves? Does anyone know which modules these are? I cannot think of a way to search for them.

I have a couple of those​: Math​::Int128\, Tie​::Array​::Packed

I have just had another idea\, a variation of the SvOOK hack\, which may render those questions moot. We could put the reference count on the end of the string\, after the null\, rather than at the beginning.

Some XS module may store hidden data between SvCUR and SvLEN (I am not saying it is a good idea to do so\, just that somebody may be doing it :-).

Something more likely to happen is XS modules first appending data into the PV slot and then updating the value of SvCUR accordingly.

p5pRT commented 11 years ago

From @cpansprout

On Wed Oct 03 02​:37​:42 2012\, salva wrote​:

My understanding of SvPV_force has always been that its purpose is to discard any values cached on the IV and NV slots.

That’s certainly true.

This is a common scenario but not the only possible one. For instance\, on dualvars as $! the IV and PV are not keep synchronized.

An XS module author may use the IV\, NV and PV slots to store unrelated values and so not calling SvPV_force before updating the value on the PV slot.

There is also sv_force_normal\, which undoes COWs.

I can image that newSVpvn might need to behave differently depending on whether it is called by core code.

IMO it would be better to use a different function name. Having functions that show different behavior when using from some place or another can be quite confusing.

I think I have abandoned that idea. Putting the refcount after the trailing null sidesteps the whole issue.

I for myself use perl core code as a reference frequently\, sometimes even copy&paste code from there in my XS modules and I expect it to work in the same way. XS programming is already difficult enough\, don't make it harder!

When XS modules use the PV pointer to store pointer-aligned binary data\, do they get perl to allocate the PV to begin with? Or do they allocate it themselves? Does anyone know which modules these are? I cannot think of a way to search for them.

I have a couple of those​: Math​::Int128\,

I have had a look at that. I see it passes sizeof(int128_t) to sv_grow. Would it be harmful if sv_grow assumed 1 byte more than it was told. I’m no C expert\, so I do not know how that might affect alignment. Also\, in your case\, this extra memory usage would happen all the time\, rather than 1/16 of the time. Nevertheless\, it would result in a 32-byte rather that 16-byte allocation\, which seems small.

So we have about three possible approaches​:

• Allocate an extra byte in sv_grow\, to allow all strings to be potential cows (calves :-). In rare circumstances this could increase memory usage significantly (if there are many Math-Int128 objects). • Only do COW when there is an extra byte available. This would be unfortunate if a string is exactly 2GB minus 1 byte long. • Upgrade to PVIV and fall back to the other COW scheme (storing a pointer in the IV slot) if the string is sufficiently long that the savings outweigh the bookkeeping costs. Some simple benchmarks would allow us to determine the threshold. We might want different thresholds depending on whether an upgrade is needed. Or we could store the poiner in SvLEN.

The first is by far the simplest. The second would definitely be an improvement over the current state of things. The third would be better than the second but take longer to implement.

My order of preference is 1\, 3\, 2\, but bearing in mind that implementing 3 means implementing 2 first and then building on top of that.

Tie​::Array​::Packed

I have just had another idea\, a variation of the SvOOK hack\, which may render those questions moot. We could put the reference count on the end of the string\, after the null\, rather than at the beginning.

Some XS module may store hidden data between SvCUR and SvLEN (I am not saying it is a good idea to do so\, just that somebody may be doing it :-).

Something more likely to happen is XS modules first appending data into the PV slot and then updating the value of SvCUR accordingly.

Such scalars would already be subject to utf8 upgrading\, unless marked read-only. So I think that answers my question about what read-only means. (We can’t COW with existing read-only variables that are not already cows.)

--

Father Chrysostomos

p5pRT commented 11 years ago

From @iabyn

On Mon\, Oct 01\, 2012 at 11​:10​:17PM -0700\, Father Chrysostomos via RT wrote​:

We could use Dave Mitchell’s free bit for now\, which in \20111007191900\.GF3046@&#8203;iabyn\.com he asked us not to squander.

I'm happy for this bit to be used for this purpose.

-- Now is the discount of our winter tent   -- sign seen outside camping shop

p5pRT commented 11 years ago

From @bulk88

On Tue Oct 02 13​:02​:26 2012\, sprout wrote​:

I have just had another idea\, a variation of the SvOOK hack\, which may render those questions moot. We could put the reference count on the end of the string\, after the null\, rather than at the beginning. If there is no room there when we are trying to fake a copy\, then we copy the string for real\, making the new buffer one byte bigger.

The chances of missing the opportunity for COW are 1 in 16\, on 32-bit darwin​:

#include \<stdlib.h> #include \<stdio.h> main() { int i; size_t prev = 0; for(i = 1; i\<=256; i++) { void *ptr = malloc(i); size_t size = malloc_size(ptr); if (size != prev) printf ("Size for %d is %d\n"\, i\, (int)size); prev = size; free(ptr); } }

Output​:

Size for 1 is 16 Size for 17 is 32 Size for 33 is 48 Size for 49 is 64 Size for 65 is 80 Size for 81 is 96 Size for 97 is 112 Size for 113 is 128 Size for 129 is 144 Size for 145 is 160 Size for 161 is 176 Size for 177 is 192 Size for 193 is 208 Size for 209 is 224 Size for 225 is 240 Size for 241 is 256

It is probably worthwhile to go ahead and allocate the extra byte anyway at the outset (only 1/16th of the time will we use more memory than needed\, but we will be trading memory for speed)\, but we will still have to check for it and fall back to copying\, because XS modules may create SVs without the extra byte.

I ported your code to Windows\, sort of. _msize and HeapSize perfectly knew the original allocation size given to malloc\, they were useless for what you wanted to see\, so I figured out what the Windows Heap manager internally says the allocation is. BTW threaded Win32 Perl puts its own headers on allocations too\, see http​://perl5.git.perl.org/perl.git/blob/6aa4fbb50135d4863535200030aadfeb8c583c6e​:/win32/vmem.h#l207 . The HEAP_ENTRY struct in Windbg was completely different from reality. Variable ActualSize I believe includes the 8 byte Heap header in its count at all times. The code will not work on anything but my machine I think. The warnings about what it was run on are in the source. I hope this information useful somehow.

p5pRT commented 11 years ago

From @bulk88

#include <stdlib.h>
#include <stdio.h>
#include <malloc.h>
#include <windows.h>

#ifdef _WIN64
#define HEAP_ENTRY_SHIFT 4
#else
#define HEAP_ENTRY_SHIFT 3
#endif

//32bit WinXP SP3 ONLY, MSVCR71.dll ONLY, not any other SP or OS or any other CRT

//if CRTs change, LFH might be turned on, this code does NOT support LFH heaps

//derived from REing ntdll and a app called Volatility
typedef struct _HEAP_ENTRY
{
    USHORT Size;
    USHORT PreviousSize;
    UCHAR SmallTagIndex;
    UCHAR Flags;
    UCHAR UnusedBytes;
    UCHAR SegmentIndex;
}  HEAP_ENTRY, *PHEAP_ENTRY;

int main() {
    int i;
    for(i = 1; i<=256; i++) {
        PHEAP_ENTRY HeapEntry;
        void * ptr = malloc(i);
        size_t RequestedEntrySize;
        size_t ActualSize;
        size_t size;
        HeapEntry = (char*)ptr-(char*)(sizeof(*HeapEntry));
        memset(ptr, 0xAF, i);
        size = _msize(ptr);
        //prove the heap header is parsed correctly, RequestedEntrySize must be equal to _msize's return
        RequestedEntrySize = (HeapEntry->Size << HEAP_ENTRY_SHIFT) - HeapEntry->UnusedBytes;
        ActualSize = (HeapEntry->Size << HEAP_ENTRY_SHIFT);
        printf ("ptr=%X i=%X _msize=%X RevEngd req sz=%X Actual=%X\n",
            ptr, i, size, RequestedEntrySize, ActualSize );
        free(ptr);
    }
    return 0;
}
p5pRT commented 11 years ago

From @bulk88

ptr=333DA0 i=1 _msize=1 RevEngd req sz=1 Actual=10 ptr=333DA0 i=2 _msize=2 RevEngd req sz=2 Actual=10 ptr=333DA0 i=3 _msize=3 RevEngd req sz=3 Actual=10 ptr=333DA0 i=4 _msize=4 RevEngd req sz=4 Actual=10 ptr=333DA0 i=5 _msize=5 RevEngd req sz=5 Actual=10 ptr=333DA0 i=6 _msize=6 RevEngd req sz=6 Actual=10 ptr=333DA0 i=7 _msize=7 RevEngd req sz=7 Actual=10 ptr=333DA0 i=8 _msize=8 RevEngd req sz=8 Actual=10 ptr=332438 i=9 _msize=9 RevEngd req sz=9 Actual=18 ptr=332438 i=A _msize=A RevEngd req sz=A Actual=18 ptr=332438 i=B _msize=B RevEngd req sz=B Actual=18 ptr=332438 i=C _msize=C RevEngd req sz=C Actual=18 ptr=332438 i=D _msize=D RevEngd req sz=D Actual=18 ptr=332438 i=E _msize=E RevEngd req sz=E Actual=18 ptr=332438 i=F _msize=F RevEngd req sz=F Actual=18 ptr=332438 i=10 _msize=10 RevEngd req sz=10 Actual=18 ptr=332450 i=11 _msize=11 RevEngd req sz=11 Actual=20 ptr=332450 i=12 _msize=12 RevEngd req sz=12 Actual=20 ptr=332450 i=13 _msize=13 RevEngd req sz=13 Actual=20 ptr=332450 i=14 _msize=14 RevEngd req sz=14 Actual=20 ptr=332450 i=15 _msize=15 RevEngd req sz=15 Actual=20 ptr=332450 i=16 _msize=16 RevEngd req sz=16 Actual=20 ptr=332450 i=17 _msize=17 RevEngd req sz=17 Actual=20 ptr=332450 i=18 _msize=18 RevEngd req sz=18 Actual=20 ptr=332470 i=19 _msize=19 RevEngd req sz=19 Actual=28 ptr=332470 i=1A _msize=1A RevEngd req sz=1A Actual=28 ptr=332470 i=1B _msize=1B RevEngd req sz=1B Actual=28 ptr=332470 i=1C _msize=1C RevEngd req sz=1C Actual=28 ptr=332470 i=1D _msize=1D RevEngd req sz=1D Actual=28 ptr=332470 i=1E _msize=1E RevEngd req sz=1E Actual=28 ptr=332470 i=1F _msize=1F RevEngd req sz=1F Actual=28 ptr=332470 i=20 _msize=20 RevEngd req sz=20 Actual=28 ptr=332498 i=21 _msize=21 RevEngd req sz=21 Actual=30 ptr=332498 i=22 _msize=22 RevEngd req sz=22 Actual=30 ptr=332498 i=23 _msize=23 RevEngd req sz=23 Actual=30 ptr=332498 i=24 _msize=24 RevEngd req sz=24 Actual=30 ptr=332498 i=25 _msize=25 RevEngd req sz=25 Actual=30 ptr=332498 i=26 _msize=26 RevEngd req sz=26 Actual=30 ptr=332498 i=27 _msize=27 RevEngd req sz=27 Actual=30 ptr=332498 i=28 _msize=28 RevEngd req sz=28 Actual=30 ptr=3324C8 i=29 _msize=29 RevEngd req sz=29 Actual=38 ptr=3324C8 i=2A _msize=2A RevEngd req sz=2A Actual=38 ptr=3324C8 i=2B _msize=2B RevEngd req sz=2B Actual=38 ptr=3324C8 i=2C _msize=2C RevEngd req sz=2C Actual=38 ptr=3324C8 i=2D _msize=2D RevEngd req sz=2D Actual=38 ptr=3324C8 i=2E _msize=2E RevEngd req sz=2E Actual=38 ptr=3324C8 i=2F _msize=2F RevEngd req sz=2F Actual=38 ptr=3324C8 i=30 _msize=30 RevEngd req sz=30 Actual=38 ptr=332500 i=31 _msize=31 RevEngd req sz=31 Actual=40 ptr=332500 i=32 _msize=32 RevEngd req sz=32 Actual=40 ptr=332500 i=33 _msize=33 RevEngd req sz=33 Actual=40 ptr=332500 i=34 _msize=34 RevEngd req sz=34 Actual=40 ptr=332500 i=35 _msize=35 RevEngd req sz=35 Actual=40 ptr=332500 i=36 _msize=36 RevEngd req sz=36 Actual=40 ptr=332500 i=37 _msize=37 RevEngd req sz=37 Actual=40 ptr=332500 i=38 _msize=38 RevEngd req sz=38 Actual=40 ptr=332540 i=39 _msize=39 RevEngd req sz=39 Actual=48 ptr=332540 i=3A _msize=3A RevEngd req sz=3A Actual=48 ptr=332540 i=3B _msize=3B RevEngd req sz=3B Actual=48 ptr=332540 i=3C _msize=3C RevEngd req sz=3C Actual=48 ptr=332540 i=3D _msize=3D RevEngd req sz=3D Actual=48 ptr=332540 i=3E _msize=3E RevEngd req sz=3E Actual=48 ptr=332540 i=3F _msize=3F RevEngd req sz=3F Actual=48 ptr=332540 i=40 _msize=40 RevEngd req sz=40 Actual=48 ptr=332588 i=41 _msize=41 RevEngd req sz=41 Actual=50 ptr=332588 i=42 _msize=42 RevEngd req sz=42 Actual=50 ptr=332588 i=43 _msize=43 RevEngd req sz=43 Actual=50 ptr=332588 i=44 _msize=44 RevEngd req sz=44 Actual=50 ptr=332588 i=45 _msize=45 RevEngd req sz=45 Actual=50 ptr=332588 i=46 _msize=46 RevEngd req sz=46 Actual=50 ptr=332588 i=47 _msize=47 RevEngd req sz=47 Actual=50 ptr=332588 i=48 _msize=48 RevEngd req sz=48 Actual=50 ptr=3325D8 i=49 _msize=49 RevEngd req sz=49 Actual=58 ptr=3325D8 i=4A _msize=4A RevEngd req sz=4A Actual=58 ptr=3325D8 i=4B _msize=4B RevEngd req sz=4B Actual=58 ptr=3325D8 i=4C _msize=4C RevEngd req sz=4C Actual=58 ptr=3325D8 i=4D _msize=4D RevEngd req sz=4D Actual=58 ptr=3325D8 i=4E _msize=4E RevEngd req sz=4E Actual=58 ptr=3325D8 i=4F _msize=4F RevEngd req sz=4F Actual=58 ptr=3325D8 i=50 _msize=50 RevEngd req sz=50 Actual=58 ptr=332630 i=51 _msize=51 RevEngd req sz=51 Actual=60 ptr=332630 i=52 _msize=52 RevEngd req sz=52 Actual=60 ptr=332630 i=53 _msize=53 RevEngd req sz=53 Actual=60 ptr=332630 i=54 _msize=54 RevEngd req sz=54 Actual=60 ptr=332630 i=55 _msize=55 RevEngd req sz=55 Actual=60 ptr=332630 i=56 _msize=56 RevEngd req sz=56 Actual=60 ptr=332630 i=57 _msize=57 RevEngd req sz=57 Actual=60 ptr=332630 i=58 _msize=58 RevEngd req sz=58 Actual=60 ptr=332690 i=59 _msize=59 RevEngd req sz=59 Actual=68 ptr=332690 i=5A _msize=5A RevEngd req sz=5A Actual=68 ptr=332690 i=5B _msize=5B RevEngd req sz=5B Actual=68 ptr=332690 i=5C _msize=5C RevEngd req sz=5C Actual=68 ptr=332690 i=5D _msize=5D RevEngd req sz=5D Actual=68 ptr=332690 i=5E _msize=5E RevEngd req sz=5E Actual=68 ptr=332690 i=5F _msize=5F RevEngd req sz=5F Actual=68 ptr=332690 i=60 _msize=60 RevEngd req sz=60 Actual=68 ptr=3326F8 i=61 _msize=61 RevEngd req sz=61 Actual=70 ptr=3326F8 i=62 _msize=62 RevEngd req sz=62 Actual=70 ptr=3326F8 i=63 _msize=63 RevEngd req sz=63 Actual=70 ptr=3326F8 i=64 _msize=64 RevEngd req sz=64 Actual=70 ptr=3326F8 i=65 _msize=65 RevEngd req sz=65 Actual=70 ptr=3326F8 i=66 _msize=66 RevEngd req sz=66 Actual=70 ptr=3326F8 i=67 _msize=67 RevEngd req sz=67 Actual=70 ptr=3326F8 i=68 _msize=68 RevEngd req sz=68 Actual=70 ptr=332768 i=69 _msize=69 RevEngd req sz=69 Actual=78 ptr=332768 i=6A _msize=6A RevEngd req sz=6A Actual=78 ptr=332768 i=6B _msize=6B RevEngd req sz=6B Actual=78 ptr=332768 i=6C _msize=6C RevEngd req sz=6C Actual=78 ptr=332768 i=6D _msize=6D RevEngd req sz=6D Actual=78 ptr=332768 i=6E _msize=6E RevEngd req sz=6E Actual=78 ptr=332768 i=6F _msize=6F RevEngd req sz=6F Actual=78 ptr=332768 i=70 _msize=70 RevEngd req sz=70 Actual=78 ptr=3327E0 i=71 _msize=71 RevEngd req sz=71 Actual=80 ptr=3327E0 i=72 _msize=72 RevEngd req sz=72 Actual=80 ptr=3327E0 i=73 _msize=73 RevEngd req sz=73 Actual=80 ptr=3327E0 i=74 _msize=74 RevEngd req sz=74 Actual=80 ptr=3327E0 i=75 _msize=75 RevEngd req sz=75 Actual=80 ptr=3327E0 i=76 _msize=76 RevEngd req sz=76 Actual=80 ptr=3327E0 i=77 _msize=77 RevEngd req sz=77 Actual=80 ptr=3327E0 i=78 _msize=78 RevEngd req sz=78 Actual=80 ptr=332860 i=79 _msize=79 RevEngd req sz=79 Actual=88 ptr=332860 i=7A _msize=7A RevEngd req sz=7A Actual=88 ptr=332860 i=7B _msize=7B RevEngd req sz=7B Actual=88 ptr=332860 i=7C _msize=7C RevEngd req sz=7C Actual=88 ptr=332860 i=7D _msize=7D RevEngd req sz=7D Actual=88 ptr=332860 i=7E _msize=7E RevEngd req sz=7E Actual=88 ptr=332860 i=7F _msize=7F RevEngd req sz=7F Actual=88 ptr=332860 i=80 _msize=80 RevEngd req sz=80 Actual=88 ptr=3328E8 i=81 _msize=81 RevEngd req sz=81 Actual=90 ptr=3328E8 i=82 _msize=82 RevEngd req sz=82 Actual=90 ptr=3328E8 i=83 _msize=83 RevEngd req sz=83 Actual=90 ptr=3328E8 i=84 _msize=84 RevEngd req sz=84 Actual=90 ptr=3328E8 i=85 _msize=85 RevEngd req sz=85 Actual=90 ptr=3328E8 i=86 _msize=86 RevEngd req sz=86 Actual=90 ptr=3328E8 i=87 _msize=87 RevEngd req sz=87 Actual=90 ptr=3328E8 i=88 _msize=88 RevEngd req sz=88 Actual=90 ptr=332978 i=89 _msize=89 RevEngd req sz=89 Actual=98 ptr=332978 i=8A _msize=8A RevEngd req sz=8A Actual=98 ptr=332978 i=8B _msize=8B RevEngd req sz=8B Actual=98 ptr=332978 i=8C _msize=8C RevEngd req sz=8C Actual=98 ptr=332978 i=8D _msize=8D RevEngd req sz=8D Actual=98 ptr=332978 i=8E _msize=8E RevEngd req sz=8E Actual=98 ptr=332978 i=8F _msize=8F RevEngd req sz=8F Actual=98 ptr=332978 i=90 _msize=90 RevEngd req sz=90 Actual=98 ptr=334DB8 i=91 _msize=91 RevEngd req sz=91 Actual=A0 ptr=334DB8 i=92 _msize=92 RevEngd req sz=92 Actual=A0 ptr=334DB8 i=93 _msize=93 RevEngd req sz=93 Actual=A0 ptr=334DB8 i=94 _msize=94 RevEngd req sz=94 Actual=A0 ptr=334DB8 i=95 _msize=95 RevEngd req sz=95 Actual=A0 ptr=334DB8 i=96 _msize=96 RevEngd req sz=96 Actual=A0 ptr=334DB8 i=97 _msize=97 RevEngd req sz=97 Actual=A0 ptr=334DB8 i=98 _msize=98 RevEngd req sz=98 Actual=A0 ptr=334E58 i=99 _msize=99 RevEngd req sz=99 Actual=A8 ptr=334E58 i=9A _msize=9A RevEngd req sz=9A Actual=A8 ptr=334E58 i=9B _msize=9B RevEngd req sz=9B Actual=A8 ptr=334E58 i=9C _msize=9C RevEngd req sz=9C Actual=A8 ptr=334E58 i=9D _msize=9D RevEngd req sz=9D Actual=A8 ptr=334E58 i=9E _msize=9E RevEngd req sz=9E Actual=A8 ptr=334E58 i=9F _msize=9F RevEngd req sz=9F Actual=A8 ptr=334E58 i=A0 _msize=A0 RevEngd req sz=A0 Actual=A8 ptr=334F00 i=A1 _msize=A1 RevEngd req sz=A1 Actual=B0 ptr=334F00 i=A2 _msize=A2 RevEngd req sz=A2 Actual=B0 ptr=334F00 i=A3 _msize=A3 RevEngd req sz=A3 Actual=B0 ptr=334F00 i=A4 _msize=A4 RevEngd req sz=A4 Actual=B0 ptr=334F00 i=A5 _msize=A5 RevEngd req sz=A5 Actual=B0 ptr=334F00 i=A6 _msize=A6 RevEngd req sz=A6 Actual=B0 ptr=334F00 i=A7 _msize=A7 RevEngd req sz=A7 Actual=B0 ptr=334F00 i=A8 _msize=A8 RevEngd req sz=A8 Actual=B0 ptr=334FB0 i=A9 _msize=A9 RevEngd req sz=A9 Actual=B8 ptr=334FB0 i=AA _msize=AA RevEngd req sz=AA Actual=B8 ptr=334FB0 i=AB _msize=AB RevEngd req sz=AB Actual=B8 ptr=334FB0 i=AC _msize=AC RevEngd req sz=AC Actual=B8 ptr=334FB0 i=AD _msize=AD RevEngd req sz=AD Actual=B8 ptr=334FB0 i=AE _msize=AE RevEngd req sz=AE Actual=B8 ptr=334FB0 i=AF _msize=AF RevEngd req sz=AF Actual=B8 ptr=334FB0 i=B0 _msize=B0 RevEngd req sz=B0 Actual=B8 ptr=335068 i=B1 _msize=B1 RevEngd req sz=B1 Actual=C0 ptr=335068 i=B2 _msize=B2 RevEngd req sz=B2 Actual=C0 ptr=335068 i=B3 _msize=B3 RevEngd req sz=B3 Actual=C0 ptr=335068 i=B4 _msize=B4 RevEngd req sz=B4 Actual=C0 ptr=335068 i=B5 _msize=B5 RevEngd req sz=B5 Actual=C0 ptr=335068 i=B6 _msize=B6 RevEngd req sz=B6 Actual=C0 ptr=335068 i=B7 _msize=B7 RevEngd req sz=B7 Actual=C0 ptr=335068 i=B8 _msize=B8 RevEngd req sz=B8 Actual=C0 ptr=335128 i=B9 _msize=B9 RevEngd req sz=B9 Actual=C8 ptr=335128 i=BA _msize=BA RevEngd req sz=BA Actual=C8 ptr=335128 i=BB _msize=BB RevEngd req sz=BB Actual=C8 ptr=335128 i=BC _msize=BC RevEngd req sz=BC Actual=C8 ptr=335128 i=BD _msize=BD RevEngd req sz=BD Actual=C8 ptr=335128 i=BE _msize=BE RevEngd req sz=BE Actual=C8 ptr=335128 i=BF _msize=BF RevEngd req sz=BF Actual=C8 ptr=335128 i=C0 _msize=C0 RevEngd req sz=C0 Actual=C8 ptr=3351F0 i=C1 _msize=C1 RevEngd req sz=C1 Actual=D0 ptr=3351F0 i=C2 _msize=C2 RevEngd req sz=C2 Actual=D0 ptr=3351F0 i=C3 _msize=C3 RevEngd req sz=C3 Actual=D0 ptr=3351F0 i=C4 _msize=C4 RevEngd req sz=C4 Actual=D0 ptr=3351F0 i=C5 _msize=C5 RevEngd req sz=C5 Actual=D0 ptr=3351F0 i=C6 _msize=C6 RevEngd req sz=C6 Actual=D0 ptr=3351F0 i=C7 _msize=C7 RevEngd req sz=C7 Actual=D0 ptr=3351F0 i=C8 _msize=C8 RevEngd req sz=C8 Actual=D0 ptr=3352C0 i=C9 _msize=C9 RevEngd req sz=C9 Actual=D8 ptr=3352C0 i=CA _msize=CA RevEngd req sz=CA Actual=D8 ptr=3352C0 i=CB _msize=CB RevEngd req sz=CB Actual=D8 ptr=3352C0 i=CC _msize=CC RevEngd req sz=CC Actual=D8 ptr=3352C0 i=CD _msize=CD RevEngd req sz=CD Actual=D8 ptr=3352C0 i=CE _msize=CE RevEngd req sz=CE Actual=D8 ptr=3352C0 i=CF _msize=CF RevEngd req sz=CF Actual=D8 ptr=3352C0 i=D0 _msize=D0 RevEngd req sz=D0 Actual=D8 ptr=335398 i=D1 _msize=D1 RevEngd req sz=D1 Actual=E0 ptr=335398 i=D2 _msize=D2 RevEngd req sz=D2 Actual=E0 ptr=335398 i=D3 _msize=D3 RevEngd req sz=D3 Actual=E0 ptr=335398 i=D4 _msize=D4 RevEngd req sz=D4 Actual=E0 ptr=335398 i=D5 _msize=D5 RevEngd req sz=D5 Actual=E0 ptr=335398 i=D6 _msize=D6 RevEngd req sz=D6 Actual=E0 ptr=335398 i=D7 _msize=D7 RevEngd req sz=D7 Actual=E0 ptr=335398 i=D8 _msize=D8 RevEngd req sz=D8 Actual=E0 ptr=335478 i=D9 _msize=D9 RevEngd req sz=D9 Actual=E8 ptr=335478 i=DA _msize=DA RevEngd req sz=DA Actual=E8 ptr=335478 i=DB _msize=DB RevEngd req sz=DB Actual=E8 ptr=335478 i=DC _msize=DC RevEngd req sz=DC Actual=E8 ptr=335478 i=DD _msize=DD RevEngd req sz=DD Actual=E8 ptr=335478 i=DE _msize=DE RevEngd req sz=DE Actual=E8 ptr=335478 i=DF _msize=DF RevEngd req sz=DF Actual=E8 ptr=335478 i=E0 _msize=E0 RevEngd req sz=E0 Actual=E8 ptr=335560 i=E1 _msize=E1 RevEngd req sz=E1 Actual=F0 ptr=335560 i=E2 _msize=E2 RevEngd req sz=E2 Actual=F0 ptr=335560 i=E3 _msize=E3 RevEngd req sz=E3 Actual=F0 ptr=335560 i=E4 _msize=E4 RevEngd req sz=E4 Actual=F0 ptr=335560 i=E5 _msize=E5 RevEngd req sz=E5 Actual=F0 ptr=335560 i=E6 _msize=E6 RevEngd req sz=E6 Actual=F0 ptr=335560 i=E7 _msize=E7 RevEngd req sz=E7 Actual=F0 ptr=335560 i=E8 _msize=E8 RevEngd req sz=E8 Actual=F0 ptr=335650 i=E9 _msize=E9 RevEngd req sz=E9 Actual=F8 ptr=335650 i=EA _msize=EA RevEngd req sz=EA Actual=F8 ptr=335650 i=EB _msize=EB RevEngd req sz=EB Actual=F8 ptr=335650 i=EC _msize=EC RevEngd req sz=EC Actual=F8 ptr=335650 i=ED _msize=ED RevEngd req sz=ED Actual=F8 ptr=335650 i=EE _msize=EE RevEngd req sz=EE Actual=F8 ptr=335650 i=EF _msize=EF RevEngd req sz=EF Actual=F8 ptr=335650 i=F0 _msize=F0 RevEngd req sz=F0 Actual=F8 ptr=335748 i=F1 _msize=F1 RevEngd req sz=F1 Actual=100 ptr=335748 i=F2 _msize=F2 RevEngd req sz=F2 Actual=100 ptr=335748 i=F3 _msize=F3 RevEngd req sz=F3 Actual=100 ptr=335748 i=F4 _msize=F4 RevEngd req sz=F4 Actual=100 ptr=335748 i=F5 _msize=F5 RevEngd req sz=F5 Actual=100 ptr=335748 i=F6 _msize=F6 RevEngd req sz=F6 Actual=100 ptr=335748 i=F7 _msize=F7 RevEngd req sz=F7 Actual=100 ptr=335748 i=F8 _msize=F8 RevEngd req sz=F8 Actual=100 ptr=335848 i=F9 _msize=F9 RevEngd req sz=F9 Actual=108 ptr=335848 i=FA _msize=FA RevEngd req sz=FA Actual=108 ptr=335848 i=FB _msize=FB RevEngd req sz=FB Actual=108 ptr=335848 i=FC _msize=FC RevEngd req sz=FC Actual=108 ptr=335848 i=FD _msize=FD RevEngd req sz=FD Actual=108 ptr=335848 i=FE _msize=FE RevEngd req sz=FE Actual=108 ptr=335848 i=FF _msize=FF RevEngd req sz=FF Actual=108 ptr=335848 i=100 _msize=100 RevEngd req sz=100 Actual=108

p5pRT commented 11 years ago

From @cpansprout

On Wed Oct 03 12​:57​:42 2012\, sprout wrote​:

I think I have abandoned that idea. Putting the refcount after the trailing null sidesteps the whole issue.

But it causes problems in the regexp engine\, which sets SvCUR to 0--something that has always been safe on COWs.

So instead I’m putting the refcount at the very end of the buffer.

So we have about three possible approaches​:

• Allocate an extra byte in sv_grow\, to allow all strings to be potential cows (calves :-). In rare circumstances this could increase memory usage significantly (if there are many Math-Int128 objects). • Only do COW when there is an extra byte available. This would be unfortunate if a string is exactly 2GB minus 1 byte long. • Upgrade to PVIV and fall back to the other COW scheme (storing a pointer in the IV slot) if the string is sufficiently long that the savings outweigh the bookkeeping costs. Some simple benchmarks would allow us to determine the threshold. We might want different thresholds depending on whether an upgrade is needed. Or we could store the poiner in SvLEN.

The first is by far the simplest. The second would definitely be an improvement over the current state of things. The third would be better than the second but take longer to implement.

My order of preference is 1\, 3\, 2\, but bearing in mind that implementing 3 means implementing 2 first and then building on top of that.

Actually\, what I am going to do is modify calls to sv_grow throughout the perl source code to account for the extra byte. That way we can avoid it in those cases where we know we will not use it.

So I think that answers my question about what read-only means. (We can’t COW with existing read-only variables that are not already cows.)

We cannot COW with any read-only scalars at all. Existing well-behaved code does SvREADONLY && !SvIsCOW\, which would break if we had read-only COWs.

(Unless we define SvIsCOW differently for XS.)

--

Father Chrysostomos

p5pRT commented 11 years ago

From @cpansprout

I still have a few problems to iron out\, but here come the stupid benchmarks​:

To do COW with a refcount stored in the string buffer\, we have to check that SvLEN is at least two bytes greater than SvCUR and than the reference count has not reached 255. That overhead slows down COW.

perl.git is the COW version. perl.git-copy is the non-COW version.

Short string​:

Pint​:perl.git sprout$ time ./miniperl -e '$x = "hello"; $y = $x for 1..10000000'

real 0m3.137s user 0m3.128s sys 0m0.006s Pint​:perl.git-copy sprout$ time ./miniperl -e '$x = "hello"; $y = $x for 1..10000000'

real 0m2.597s user 0m2.587s sys 0m0.006s

No cow is clearly the winner.

Long string​:

Pint​:perl.git sprout$ time ./miniperl -e '$x = "hello"x1000; $y = $x for 1..10000000'

real 0m3.139s user 0m3.128s sys 0m0.008s Pint​:perl.git-copy sprout$ time ./miniperl -e '$x = "hello"x1000; $y = $x for 1..10000000'

real 0m7.533s user 0m7.520s sys 0m0.009s

For cow\, the speed is the same. For no cow\, things are clearly slower.

Harness is hanging for some reason\, but minitest works so far. ‘make test’ has a few failures.

You can play with it if you want. It’s on the sprout/ookow branch (SvOOK-style COW [except it does not involve SvOOK any more]).

--

Father Chrysostomos

p5pRT commented 11 years ago

From @cpansprout

On Mon Oct 08 00​:37​:16 2012\, sprout wrote​:

You can play with it if you want. It’s on the sprout/ookow branch (SvOOK-style COW [except it does not involve SvOOK any more]).

I forgot to mention it requires -Accflags=-DPERL_NEW_COPY_ON_WRITE.

--

Father Chrysostomos

p5pRT commented 11 years ago

From @cpansprout

On Mon Oct 08 08​:40​:33 2012\, sprout wrote​:

On Mon Oct 08 00​:37​:16 2012\, sprout wrote​:

You can play with it if you want. It’s on the sprout/ookow branch (SvOOK-style COW [except it does not involve SvOOK any more]).

I forgot to mention it requires -Accflags=-DPERL_NEW_COPY_ON_WRITE.

And now I have all core tests passing\, except Peek.t\, which will need tweaking when I have finished.

I still need to do some benchmarks to find a threshold where copy-on-write speeds things up (and then only do COW above that threshold). We may need two thresholds depending on whether the destination string has a buffer already\, and whether that buffer is big enough. (Copying may be faster than freeing a buffer.)

I also need to allocate that extra byte for strings above the COW threshold\, to store the buffer’s reference count.

And then I need to see whether eliminating swipe makes any speed difference.

And then I need to benchmark it against some real code. (Test.Simple’s test suite run with WWW​::Scripter might work well. [See \https://rt.cpan.org/Ticket/Display.html?id=50462#txn-681840. Downloading it and using file​:/// URLs will eliminate any network latency.] Although a test suite is not typical code\, as it tries to exercise every code path\, Test.Simple’s tests are testing all of Test.Simple’s code paths\, not all of JE’s. So JE may be a good buffer that translates atypical JS code into typical Perl code.)

--

Father Chrysostomos

p5pRT commented 11 years ago

From [Unknown Contact. See original ticket]

On Mon Oct 08 08​:40​:33 2012\, sprout wrote​:

On Mon Oct 08 00​:37​:16 2012\, sprout wrote​:

You can play with it if you want. It’s on the sprout/ookow branch (SvOOK-style COW [except it does not involve SvOOK any more]).

I forgot to mention it requires -Accflags=-DPERL_NEW_COPY_ON_WRITE.

And now I have all core tests passing\, except Peek.t\, which will need tweaking when I have finished.

I still need to do some benchmarks to find a threshold where copy-on-write speeds things up (and then only do COW above that threshold). We may need two thresholds depending on whether the destination string has a buffer already\, and whether that buffer is big enough. (Copying may be faster than freeing a buffer.)

I also need to allocate that extra byte for strings above the COW threshold\, to store the buffer’s reference count.

And then I need to see whether eliminating swipe makes any speed difference.

And then I need to benchmark it against some real code. (Test.Simple’s test suite run with WWW​::Scripter might work well. [See \https://rt.cpan.org/Ticket/Display.html?id=50462#txn-681840. Downloading it and using file​:/// URLs will eliminate any network latency.] Although a test suite is not typical code\, as it tries to exercise every code path\, Test.Simple’s tests are testing all of Test.Simple’s code paths\, not all of JE’s. So JE may be a good buffer that translates atypical JS code into typical Perl code.)

--

Father Chrysostomos

p5pRT commented 11 years ago

From @rurban

On Tue\, Oct 2\, 2012 at 11​:44 PM\, Father Chrysostomos via RT \perlbug\-followup@&#8203;perl\.org wrote​:

On Tue Oct 02 13​:02​:26 2012\, sprout wrote​:

On Tue Oct 02 12​:06​:42 2012\, bulk88 wrote​:

On Mon Sep 17 21​:44​:53 2012\, sprout wrote​:

The flag space is rather crowded. Maybe it’s about time we made pad names their own type. Their being SVs makes SVs themselves more complicated. A dedicated pad name type would also allow the name to be allocated as part of the same memory block\, much like a hek.

There is the existing SVf_FAKE and SVf_READONLY for HEKs. Maybe I didn't see it in this thread\, but what is wrong with using shared HEKs for COW?

We already do when we have a HEK already. But taking an existing string and creating a HEK out of it requires copying it. I tried an experiment to allocate all strings as HEKs initially\, and then upgrade them to shared HEKs by adding them to the string table when they would have been copied\, but the shared string table overhead outweighs any savings.

What we are looking for is a cheap way to do COW with existing strings.

Since the proposal as I read is it to introduce 2 different "start of buffer" formats for the offset system\, is a flag needed? Why not just use SVf_OOK and let the macros and functions figure out if it is a COW or offset in the start of string data?

The new flag would apply also to SVs containing shared HEKs.

By using a new flag instead of SvREADONLY\, we can make the dozen or so modules on CPAN that do if (SvREADONLY(sv)) croak(...) start working correctly.

But no\, the new flag would not be strictly necessary; just convenient.

Here is another reason​:

COW currently does not work on read-only scalars. If we add a COW flag\, then copying a read-only string will entail turning on the COW flag and increasing the reference count at SvEND+1.

But is it OK to modify the string buffer of a read-only scalar? What does read-only mean? That the contents of the scalar must not be modified\, or that modifications should have no observable effects at the Perl language level?

Both\, though only the first property is yet used.

SvREADONLY data is currently only used for run-time checks\, but will be used in the future for compile-time checks and optimizations.

BTW​: Optimizing on data properties is not a good idea. We generally optimize on OP properties only. So it's a mess. But only really needed for special objects (classes and ISA).

But is it OK to modify the string buffer of a read-only scalar.

No.

What does read-only mean?

It is a reserved data flag to be used for compile-time and run-time checks\, and for future compile-time optimizations\, when we will have read-only lexicals\, classes (immutable) and @​ISA (final).

At those times we only implemented the simpliest readonly-ness in PHP fashion by using global functions\, without sigil (CONSTSUB)\, but did not come to implement the rest. And it was really not needed with our simple OO system those days\, which need not to be optimized. Now with Moose\, Catalyst and Dist​::Zilla and those beasts read-only optimizations will bear fruit - in the > 200-2000% range\, not 10%.

Is it OK to modify the hash key a read-only HV.

No

Is it OK to modify the hash value a read-only HV.

Yes. readonly-less is only shallow. A HV key points to a new SV\, which can/needs to be protected seperately. Same for AV.

Is it OK to add or remove hash keys from a read-only HV or AV.

No

No answers yet to readonly function arguments and return declarations as it is too early for you yet. I spec'd it in and gave more answers in my typed/ro branch and pdd. -- Reini Urban http​://cpanel.net/ http​://www.perl-compiler.org/

p5pRT commented 11 years ago

From @cpansprout

On Tue Oct 16 10​:14​:42 2012\, sprout wrote​:

On Mon Oct 08 08​:40​:33 2012\, sprout wrote​:

On Mon Oct 08 00​:37​:16 2012\, sprout wrote​:

You can play with it if you want. It’s on the sprout/ookow branch (SvOOK-style COW [except it does not involve SvOOK any more]).

I forgot to mention it requires -Accflags=-DPERL_NEW_COPY_ON_WRITE.

And now I have all core tests passing\, except Peek.t\, which will need tweaking when I have finished.

I still need to do some benchmarks to find a threshold where copy-on-write speeds things up (and then only do COW above that threshold). We may need two thresholds depending on whether the destination string has a buffer already\, and whether that buffer is big enough. (Copying may be faster than freeing a buffer.)

You can now see this on the sprout/ookow2 branch. It still requires -Accflags=-DPERL_NEW_COPY_ON_WRITE.

There are two #defines\, SV_COW_THRESHOLD and SV_COWBUF_THRESHOLD. These are minimum string lengths where COW kicks in. The latter is used if the destination string already has a buffer large enough to hold the string.

For benchmarking I compiled without -DDEBUGGING. Earlier I had used it\, but it skewed the results because of extra PL_debug & DEBUG_C_FLAG checks in the COW-handling code.

I tried those silly benchmarks with repeated variable assignment in a loop\, and found that sv_force_normal_flags is the real bottleneck when it comes to copy-on-write. For these simple loops\, the COW overhead exceeds the cost of copying up to 209 bytes. Above that\, COW is faster. If the destination string is already large enough\, then copying is so fast that it beats the pants of COW up to 2131 bytes.

When I tried those numbers (209/2131) with mktables\, it made absolutely no difference to the speed\, but 0/0 (always COW) showed a slight speed-up. So I did repeated tests with different numbers and found 0/500 to be ideal.

I then tried David Wheeler’s Test.Simple JavaScript library with WWW​::Scripter. Its test suite is written in JavaScript and runs in a browser (like WWW​::Scripter :-). The optimum setting seemed to be 0/1250\, which is just as fast as 0/500 for mktables. So that’s the setting I currently have on my branch.

I set up those #defines with #ifndef conditions so that platform-specific hints can override them. I’m sure the ideal settings will be different for Windows\, for instance.

For anyone who wants to try Test.Simple with blead (I think it makes a good benchmark)​:

Get JS​::Test​::Simple off CPAN and then move the tests/ folder somewhere convenient (I used /ts/).

Install Lexical​::Var into your blead installation with the patch from \https://rt.cpan.org/Ticket/Display.html?id=80309#txn-1131526.

Install WWW​::Scripter​::Plugin​::Ajax.

Then run this ‘one’-liner\, with the right file​: URL​:

bleadperl -MTime​::HiRes=sleep -e '++$INC{"JE/Destroyer.pm"};use WWW​::Scripter; agent_alias{$w=new WWW​::Scripter}"Mac Safari"; $w->use_plugin(Ajax); $w->get("file​:///ts/index.html "); $w->wait_for_timers; print $w->content(format=>text)'

The %INC fiddling is a necessary hack. It basically has to do with WWW​::Scripter​::Plugin​::JavaScript’s attempts to handle circular references conflicting with Test.Simple’s expectations. Pretending to load JE​::Destroyer disables that circularity handling.

I also need to allocate that extra byte for strings above the COW threshold\, to store the buffer’s reference count.

Not done yet\, but that is next on my list.

And then I need to see whether eliminating swipe makes any speed difference.

Yes. It makes things slower.

(And\, smueller\, did you see my message about smoking sprout/ookow? If you have not started\, could you use sprout/ookow2 instead? Thank you.)

--

Father Chrysostomos

p5pRT commented 11 years ago

From chromatic@wgz.org

On Tuesday\, October 23\, 2012 01​:01​:56 PM Father Chrysostomos via RT wrote​:

I tried those silly benchmarks with repeated variable assignment in a loop\, and found that sv_force_normal_flags is the real bottleneck when it comes to copy-on-write.

FWIW\, that's been the bottleneck in most of the microbenchmarks I've profiled too. No obvious local improvements have ever jumped out at me.

-- c

p5pRT commented 11 years ago

From @cpansprout

On Tue Oct 23 13​:01​:56 2012\, sprout wrote​:

There are two

*new*

#defines\, SV_COW_THRESHOLD and SV_COWBUF_THRESHOLD. These are minimum string lengths where COW kicks in. The latter is used if the destination string already has a buffer large enough to hold the string.

For benchmarking I compiled without -DDEBUGGING. Earlier I had used it\, but it skewed the results because of extra PL_debug & DEBUG_C_FLAG checks in the COW-handling code.

I tried those silly benchmarks with repeated variable assignment in a loop\, and found that sv_force_normal_flags is the real bottleneck when it comes to copy-on-write. For these simple loops\, the COW overhead exceeds the cost of copying up to 209 bytes. Above that\, COW is faster. If the destination string is already large enough\, then copying is so fast that it beats the pants of COW up to 2131 bytes.

s/of\K/f/

When I tried those numbers (209/2131) with mktables\, it made absolutely no difference to the speed\, but 0/0 (always COW) showed a slight speed-up. So I did repeated tests with different numbers and found 0/500 to be ideal.

I then tried David Wheeler’s Test.Simple JavaScript library with WWW​::Scripter. Its test suite is written in JavaScript and runs in a browser (like WWW​::Scripter :-). The optimum setting seemed to be 0/1250\, which is just as fast as 0/500 for mktables. So that’s the setting I currently have on my branch.

I forgot to mention that the speed increase I saw with mktables was 1.8%\, and with Test.Simple 3%.

Those might not be significant\, but programs that deal with large strings should see a huge speed-up.

--

Father Chrysostomos

p5pRT commented 11 years ago

From @tsee

On 10/23/2012 10​:01 PM\, Father Chrysostomos via RT wrote​:

(And\, smueller\, did you see my message about smoking sprout/ookow? If you have not started\, could you use sprout/ookow2 instead? Thank you.)

Umm\, too late. I think you missed my mail about starting the smoke. It actually appears to be done​:

http​://users.perl5.git.perl.org/~tsee/sprout-ookow/

(Progress​: http​://users.perl5.git.perl.org/~tsee/progress.html)

Should I kick off sprout/ookow2 nonetheless?

--Steffen