COW related performance regression in 5.19

p5pRT commented 10 years ago

Migrated from rt.perl.org#121796 (status was 'resolved')

Searchable as RT121796$

p5pRT commented 10 years ago

From @demerphq

The attached benchmark shows a marked performance degradation between earlier perls and 5.19\, I see approx 20% drop.

The performance drop has been bisected back to:

commit 13b0f67d12a6400f01bbc27945223ca68a12a7ef Author: David Mitchell \davem@iabyn\.com Date: Wed May 22 16:38:29 2013 +0100

re-enable Copy-on-Write by default.

COW was first introduced (and enabled by default) in 5.17.7. It was disabled by default in 5.17.10\, because it was though to have too many rough edges for the 5.18.0 release.

By re-enabling it now\, early in the 5.19.x release cycle\, hopefully it will be ready for production use by 5.20.

This commit mainly reverts 9f351b45f4 and e1fd41328c (with modifications)\, then updates perldelta.

-- perl -Mre=debug -e "/just|another|perl|hacker/"

p5pRT commented 10 years ago

From @demerphq

hash_bench.pl

p5pRT commented 10 years ago

From @rjbs

* yves orton \perlbug\-followup@perl\.org [2014-05-04T06:08:46]

The attached benchmark shows a marked performance degradation between earlier perls and 5.19\, I see approx 20% drop.

Some discussion on IRC suggested that the fundamental problem is that COW is faster on big strings and slower on short strings. There is a constant setting the minimum length at which strings should be COW\, and it is currently set to zero\, and was intended to be tuned up... but never was.

Is a reasonable next step to find a decent setting for this value to have a better trade-off? It seems so to me. We don't need it to be optimal in every way\, of course\, just to find a happier value.

-- rjbs

p5pRT commented 10 years ago

The RT System itself - Status changed from 'new' to 'open'

p5pRT commented 10 years ago

From @demerphq

On 6 May 2014 16:46\, Ricardo Signes \perl\.p5p@rjbs\.manxome\.org wrote:

* yves orton \perlbug\-followup@perl\.org [2014-05-04T06:08:46]

The attached benchmark shows a marked performance degradation between earlier perls and 5.19\, I see approx 20% drop.

Some discussion on IRC suggested that the fundamental problem is that COW is faster on big strings and slower on short strings. There is a constant setting the minimum length at which strings should be COW\, and it is currently set to zero\, and was intended to be tuned up... but never was.

Is a reasonable next step to find a decent setting for this value to have a better trade-off? It seems so to me. We don't need it to be optimal in every way\, of course\, just to find a happier value.

I thought I tried setting this and it had no affect whatsoever.

Confusingly there are actually two vars for this\, and it isnt clear which is the right one to setp.

I will try to play further.

Yves

-- perl -Mre=debug -e "/just|another|perl|hacker/"

p5pRT commented 10 years ago

From @demerphq

I have further details on this issue.

$ cat ../read_bench.pl use strict; use warnings;

open my $fh\, "\<"\, "/usr/share/dict/american-english" or die "failed to open file: $!";

my @got; while (\<$fh>) { push @got\, $_; } __END__

Performance is 5.14.2:

$ perl -E'say $]'; dumbbench -- perl ../read_bench.pl 5.014002 cmd: Ran 25 iterations (2 outliers). cmd: Rounded run time per iteration: 1.0204e-01 +/- 7.4e-04 (0.7%)

Performance in stock blead from this morning:

$ ./perl -Ilib -E'say $]'; dumbbench -- ./perl -Ilib ../read_bench.pl 5.019012 cmd: Ran 21 iterations (1 outliers). cmd: Rounded run time per iteration: 4.0526e-01 +/- 9.6e-04 (0.2%) $ ./perl -Ilib -E'say $]'; dumbbench -- ./perl -Ilib ../read_bench.pl 5.019012 cmd: Ran 20 iterations (0 outliers). cmd: Rounded run time per iteration: 4.089e-01 +/- 2.0e-03 (0.5%)

Setting SV_COW_THRESHOLD 1250 which is the same value as SV_COWBUF_THRESHOLD results in this:

$ ./perl -Ilib -E'say $]'; dumbbench -- ./perl -Ilib ../read_bench.pl 5.019012 cmd: Ran 26 iterations (6 outliers). cmd: Rounded run time per iteration: 1.0298e-01 +/- 2.5e-04 (0.2%) $ ./perl -Ilib -E'say $]'; dumbbench -- ./perl -Ilib ../read_bench.pl 5.019012 cmd: Ran 27 iterations (7 outliers). cmd: Rounded run time per iteration: 1.0416e-01 +/- 3.2e-04 (0.3%)

However I do not see simply setting SV_COW_THRESHOLD as the solution. I struggle to understand why using COW should make this code four times slower. And I wonder if using a file with longer strings in it which are longer than the threshold will also see the same 4 times slowdown. More research required.

cheers\, Yves

p5pRT commented 10 years ago

From @demerphq

Further to this\, i pushed the following commit as a branch. It moves the threshold definitions to sv.c so that if you change them you dont have to rebuild everything. The definitions are not used anywhere else in the core.

commit ff6e449b1507c1207bf3695bcac26ff162423639 Author: Yves Orton \demerphq@gmail\.com Date: Thu May 8 11:45:35 2014 +0200

Move PERL_NEW_COPY_ON_WRITE thresholds to sv.c so its faster to rebuild when tweaking them

On 8 May 2014 11:36\, demerphq \demerphq@gmail\.com wrote:

I have further details on this issue.

$ cat ../read_bench.pl use strict; use warnings;

open my $fh\, "\<"\, "/usr/share/dict/american-english" or die "failed to open file: $!";

my @got; while (\<$fh>) { push @got\, $_; } __END__

Performance is 5.14.2:

$ perl -E'say $]'; dumbbench -- perl ../read_bench.pl 5.014002 cmd: Ran 25 iterations (2 outliers). cmd: Rounded run time per iteration: 1.0204e-01 +/- 7.4e-04 (0.7%)

Performance in stock blead from this morning:

$ ./perl -Ilib -E'say $]'; dumbbench -- ./perl -Ilib ../read_bench.pl 5.019012 cmd: Ran 21 iterations (1 outliers). cmd: Rounded run time per iteration: 4.0526e-01 +/- 9.6e-04 (0.2%) $ ./perl -Ilib -E'say $]'; dumbbench -- ./perl -Ilib ../read_bench.pl 5.019012 cmd: Ran 20 iterations (0 outliers). cmd: Rounded run time per iteration: 4.089e-01 +/- 2.0e-03 (0.5%)

Setting SV_COW_THRESHOLD 1250 which is the same value as SV_COWBUF_THRESHOLD results in this:

$ ./perl -Ilib -E'say $]'; dumbbench -- ./perl -Ilib ../read_bench.pl 5.019012 cmd: Ran 26 iterations (6 outliers). cmd: Rounded run time per iteration: 1.0298e-01 +/- 2.5e-04 (0.2%) $ ./perl -Ilib -E'say $]'; dumbbench -- ./perl -Ilib ../read_bench.pl 5.019012 cmd: Ran 27 iterations (7 outliers). cmd: Rounded run time per iteration: 1.0416e-01 +/- 3.2e-04 (0.3%)

However I do not see simply setting SV_COW_THRESHOLD as the solution. I struggle to understand why using COW should make this code four times slower. And I wonder if using a file with longer strings in it which are longer than the threshold will also see the same 4 times slowdown. More research required.

cheers\, Yves

-- perl -Mre=debug -e "/just|another|perl|hacker/"

p5pRT commented 10 years ago

From mail@steffen-mueller.net

On 05/04/2014 12:08 PM\, yves orton (via RT) wrote:

# New Ticket Created by yves orton # Please include the string: [perl #121796] # in the subject line of all future correspondence about this issue. # \<URL: https://rt-archive.perl.org/perl5/Ticket/Display.html?id=121796 >

Shorter test case. The two perls are the commit before the sha1 that Yves pointed out and AT that exact commit (re-enabling COW).

$ time install-COW/bin/perl5.19.1 -e 'my $s = "x\n" x 1e5; open my $fh\, "\<"\, \$s or die $!; my @got; while (\<$fh>) {$a{$_}=$_}'

real 0m0.069s user 0m0.065s sys 0m0.004s

$ time install-COW/bin/perl5.19.1 -e 'my $s = ("x\n") x 1e5; open my $fh\, "\<"\, \$s or die $!; my @got; while (\<$fh>) {push @got\, $_}'

real 0m1.326s user 0m0.254s sys 0m1.050s

If I do actual file IO\, the same happens\, so don't be put off by the "open string ref" in there. If $_ is not used\, appended to a scalar\, or stored in a hash\, the slowdown doesn't happen. But if it's pushed in an array... well see above. A COW-enabled perl with a high COW threshold set does NOT exhibit this problem either.

We're still reading code (and profiler output) to find the root cause.

--Steffen

p5pRT commented 10 years ago

From @demerphq

On 8 May 2014 13:33\, Steffen Mueller \mail@steffen\-mueller\.net wrote:

On 05/04/2014 12:08 PM\, yves orton (via RT) wrote:

# New Ticket Created by yves orton # Please include the string: [perl #121796] # in the subject line of all future correspondence about this issue. # \<URL: https://rt-archive.perl.org/perl5/Ticket/Display.html?id=121796 >

Shorter test case. The two perls are the commit before the sha1 that Yves pointed out and AT that exact commit (re-enabling COW).

$ time install-COW/bin/perl5.19.1 -e 'my $s = "x\n" x 1e5; open my $fh\, "\<"\, \$s or die $!; my @got; while (\<$fh>) {$a{$_}=$_}'

real 0m0.069s user 0m0.065s sys 0m0.004s

$ time install-COW/bin/perl5.19.1 -e 'my $s = ("x\n") x 1e5; open my $fh\, "\<"\, \$s or die $!; my @got; while (\<$fh>) {push @got\, $_}'

real 0m1.326s user 0m0.254s sys 0m1.050s

If I do actual file IO\, the same happens\, so don't be put off by the "open string ref" in there. If $_ is not used\, appended to a scalar\, or stored in a hash\, the slowdown doesn't happen. But if it's pushed in an array... well see above. A COW-enabled perl with a high COW threshold set does NOT exhibit this problem either.

We're still reading code (and profiler output) to find the root cause.

The only place the threshold is checked is via the GE_COW_THRESHOLD(cur) which is used exactly once in our code\, in sv.c on line 4447 (give or take).\, in a complex condition. Something in the block that condition enables is the source of this performance hit.

Yves

-- perl -Mre=debug -e "/just|another|perl|hacker/"

p5pRT commented 10 years ago

From PeterCMartini@GMail.com

RT 121815 was reduced down to a script very similar to the benchmark here (I put it in the body of the email instead of as an attachment at http://nntp.perl.org/group/perl.perl5.porters/215113); can/should that be made a child ticket of this one (or some such? I don't really know RT).

p5pRT commented 10 years ago

From [Unknown Contact. See original ticket]

RT 121815 was reduced down to a script very similar to the benchmark here (I put it in the body of the email instead of as an attachment at http://nntp.perl.org/group/perl.perl5.porters/215113); can/should that be made a child ticket of this one (or some such? I don't really know RT).

p5pRT commented 10 years ago

From @iabyn

On Thu\, May 08\, 2014 at 01:33:16PM +0200\, Steffen Mueller wrote:

$ time install-COW/bin/perl5.19.1 -e 'my $s = ("x\n") x 1e5; open my $fh\, "\<"\, \$s or die $!; my @got; while (\<$fh>) {push @got\, $_}'

real 0m1.326s user 0m0.254s sys 0m1.050s

The fault here appears to lie with sv_gets() rather than COW per se. What happens is that for some reason\, sv_gets() creates a new SVPV with an SvLEN of 200_000 (equal to the length of $s\, or how much data is in the buffer). It then reads two characters into it. So it sets $_ to:

SV = PV(0xa98c18) at 0xac4638 REFCNT = 1 FLAGS = (POK\,pPOK) PV = 0x7ffff180c028 "x\n"\0 CUR = 2 LEN = 200000

The sv_setsv() called from pp_push() then does a COW copy of $_\, so the SV pushed onto the array has an SvLEN() of 200_000.

At the next call to sv_gets()\, the COW is dropped from $_ and another 200K buffer is malloc()ed for it.

So the net effect is that every SV pushed onto the array has a 200K buffer.

How much better this is when reading from a file\, I don't yet know; I suspect that it will create lots of 4K SVs.

I had a quick look at sv_gets() 5.18.2 and it was creating 80 byte strings\, so this is probably a regression. I suspect that post 5.18.2 (i.e. after it started allocating large buffers)\, but before COW was enabled\, it was just copying $_ each time\, so only one 200K buffer was being allocated (and repeatedly reused)\, and all the SVs being pushed were only a few bytes.

If I'm right\, then the real fix probably needs to be in sv_gets(); although possibly the code in sv_setsv() could be tweaked to not COW for a string with small SvCUR() but very large SvLEN() ?

Anyway I thought I'd report this early\, since other people are working on this in parallel.

-- I before E. Except when it isn't.

p5pRT commented 10 years ago

From @iabyn

On Thu\, May 08\, 2014 at 04:28:41PM +0100\, Dave Mitchell wrote:

The fault here appears to lie with sv_gets() rather than COW per se.

And looking further: with this code:

my $s = "x\n" x 1e5; open my $fh\, "\<"\, \$s or die $!; #open my $fh\, "\<"\, "/usr/share/dict/words" or die; my @got; study; while (\<$fh>) { #$a{$_}=$_ push @got\, $_; } use Devel::Peek; Dump $got[1];

gives:

$ perl5190 ~/tmp/p SV = PV(0x1ce9e80) at 0x1d0db40 REFCNT = 1 FLAGS = (POK\,pPOK) PV = 0x1d26d60 "x\n"\0 CUR = 2 LEN = 16

$ perl5191 ~/tmp/p SV = PV(0x1e43e80) at 0x1e67cd0 REFCNT = 1 FLAGS = (POK\,IsCOW\,pPOK) PV = 0x7fe865c0a010 "x\n"\0 CUR = 2 LEN = 200000 COW_REFCNT = 0

Note the massive LEN.

And changing the code to read from a regular file gives:

$ perl5190 ~/tmp/p SV = PV(0x1573e90) at 0x15871f8 REFCNT = 1 FLAGS = (POK\,pPOK) PV = 0x1595d20 "10-point\n"\0 CUR = 9 LEN = 16

$ perl5191 ~/tmp/p SV = PV(0x1714e90) at 0x17282c8 REFCNT = 1 FLAGS = (POK\,IsCOW\,pPOK) PV = 0x1775840 "10-point\n"\0 CUR = 9 LEN = 8192 COW_REFCNT = 0

-- "Foul and greedy Dwarf - you have eaten the last candle." -- "Hordes of the Things"\, BBC Radio.

p5pRT commented 10 years ago

From @demerphq

On 8 May 2014 17:48\, Dave Mitchell \davem@iabyn\.com wrote:

On Thu\, May 08\, 2014 at 04:28:41PM +0100\, Dave Mitchell wrote:

The fault here appears to lie with sv_gets() rather than COW per se.

And looking further: with this code:

my $s = "x\\n" x 1e5;
open my $fh\, "\<"\, \\$s or die $\!;
\#open my $fh\, "\<"\, "/usr/share/dict/words" or die;
my @&#8203;got;
study;
while \(\<$fh>\) \{
    \#$a\{$\_\}=$\_
    push @&#8203;got\, $\_;
\}
use Devel&#8203;::Peek; Dump $got\[1\];

gives:

$ perl5190 ~/tmp/p
SV = PV\(0x1ce9e80\) at 0x1d0db40
  REFCNT = 1
  FLAGS = \(POK\,pPOK\)
  PV = 0x1d26d60 "x\\n"\\0
  CUR = 2
  LEN = 16

$ perl5191 ~/tmp/p
SV = PV\(0x1e43e80\) at 0x1e67cd0
  REFCNT = 1
  FLAGS = \(POK\,IsCOW\,pPOK\)
  PV = 0x7fe865c0a010 "x\\n"\\0
  CUR = 2
  LEN = 200000
  COW\_REFCNT = 0

Note the massive LEN.

And changing the code to read from a regular file gives:

$ perl5190 ~/tmp/p
SV = PV\(0x1573e90\) at 0x15871f8
  REFCNT = 1
  FLAGS = \(POK\,pPOK\)
  PV = 0x1595d20 "10\-point\\n"\\0
  CUR = 9
  LEN = 16

$ perl5191 ~/tmp/p
SV = PV\(0x1714e90\) at 0x17282c8
  REFCNT = 1
  FLAGS = \(POK\,IsCOW\,pPOK\)
  PV = 0x1775840 "10\-point\\n"\\0
  CUR = 9
  LEN = 8192
  COW\_REFCNT = 0

This probably also explains the out of memory errors that people are seeing as well.

Yves

-- perl -Mre=debug -e "/just|another|perl|hacker/"

p5pRT commented 10 years ago

From @Leont

On Thu\, May 8\, 2014 at 5:28 PM\, Dave Mitchell \davem@iabyn\.com wrote:

On Thu\, May 08\, 2014 at 01:33:16PM +0200\, Steffen Mueller wrote:

$ time install-COW/bin/perl5.19.1 -e 'my $s = ("x\n") x 1e5; open my $fh\, "\<"\, \$s or die $!; my @got; while (\<$fh>) {push @got\, $_}'

real 0m1.326s user 0m0.254s sys 0m1.050s

The fault here appears to lie with sv_gets() rather than COW per se. What happens is that for some reason\, sv_gets() creates a new SVPV with an SvLEN of 200_000 (equal to the length of $s\, or how much data is in the buffer). It then reads two characters into it. So it sets $_ to:
SV = PV$0xa98c18$ at 0xac4638
  REFCNT = 1
  FLAGS = $POK\,pPOK$
  PV = 0x7ffff180c028 "x\\n"\\0
  CUR = 2
  LEN = 200000
The sv_setsv() called from pp_push() then does a COW copy of $_\, so the SV pushed onto the array has an SvLEN() of 200_000.

At the next call to sv_gets()\, the COW is dropped from $_ and another 200K buffer is malloc()ed for it.

So the net effect is that every SV pushed onto the array has a 200K buffer.

How much better this is when reading from a file\, I don't yet know; I suspect that it will create lots of 4K SVs.

I had a quick look at sv_gets() 5.18.2 and it was creating 80 byte strings\, so this is probably a regression. I suspect that post 5.18.2 (i.e. after it started allocating large buffers)\, but before COW was enabled\, it was just copying $_ each time\, so only one 200K buffer was being allocated (and repeatedly reused)\, and all the SVs being pushed were only a few bytes.

sv_gets is begging for a rewrite\, for more reasons than just this one. It's full with optimizations that I are not necessarily optimal.

If I'm right\, then the real fix probably needs to be in sv_gets(); although possibly the code in sv_setsv() could be tweaked to not COW for a string with small SvCUR() but very large SvLEN() ?

That is a good point too.

Leon

p5pRT commented 10 years ago

From @demerphq

On 8 May 2014 17:48\, Dave Mitchell \davem@iabyn\.com wrote:

On Thu\, May 08\, 2014 at 04:28:41PM +0100\, Dave Mitchell wrote:

The fault here appears to lie with sv_gets() rather than COW per se.

And looking further: with this code:

my $s = "x\\n" x 1e5;
open my $fh\, "\<"\, \\$s or die $\!;
\#open my $fh\, "\<"\, "/usr/share/dict/words" or die;
my @&#8203;got;
study;
while \(\<$fh>\) \{
    \#$a\{$\_\}=$\_
    push @&#8203;got\, $\_;
\}
use Devel&#8203;::Peek; Dump $got\[1\];

gives:

$ perl5190 ~/tmp/p
SV = PV\(0x1ce9e80\) at 0x1d0db40
  REFCNT = 1
  FLAGS = \(POK\,pPOK\)
  PV = 0x1d26d60 "x\\n"\\0
  CUR = 2
  LEN = 16

$ perl5191 ~/tmp/p
SV = PV\(0x1e43e80\) at 0x1e67cd0
  REFCNT = 1
  FLAGS = \(POK\,IsCOW\,pPOK\)
  PV = 0x7fe865c0a010 "x\\n"\\0
  CUR = 2
  LEN = 200000
  COW\_REFCNT = 0

Note the massive LEN.

And changing the code to read from a regular file gives:

$ perl5190 ~/tmp/p
SV = PV\(0x1573e90\) at 0x15871f8
  REFCNT = 1
  FLAGS = \(POK\,pPOK\)
  PV = 0x1595d20 "10\-point\\n"\\0
  CUR = 9
  LEN = 16

$ perl5191 ~/tmp/p
SV = PV\(0x1714e90\) at 0x17282c8
  REFCNT = 1
  FLAGS = \(POK\,IsCOW\,pPOK\)
  PV = 0x1775840 "10\-point\\n"\\0
  CUR = 9
  LEN = 8192
  COW\_REFCNT = 0

Thanks\, this gave me the clue. See branch yves/cow_threshold_in_sv_c for a possible fix.

BTW\, setting PERL_PREALLOC_EMPTY_IN_SV_GETS to a higher number produces a measurable slowdown in dumbbench.

The code probably needs a comment too.

Patch:

commit b53d1dc050f8f6417238737fd746678e811ab763 Author: Yves Orton \demerphq@gmail\.com Date: Fri May 9 02:38:32 2014 +0200

in sv_gets() if the sv is empty then preallocate a small buffer

When the COW logic is triggered there is an unfortunate slowdown.

https://rt-archive.perl.org/perl5/Ticket/Display.html?id=121796

What happens is that the buffer gets "stolen" by COW\, and $_ ends up with SvLEN()==0\, which through a tortorious pathway leads us to allocate the same amount of memory as there is in the current buffer. Which then gets COW'ed out\, and etc.

By initializing the string to at least something this does not happen.

sv_gets() seriously needs love. :-(

Inline Patch

```diff diff --git a/sv.c b/sv.c index 439b973..3c52e0d 100644 --- a/sv.c +++ b/sv.c @@ -46,6 +46,8 @@ char *gconvert(double, int, int, char *); #endif +#define PERL_PREALLOC_EMPTY_IN_SV_GETS 20 + #ifdef PERL_NEW_COPY_ON_WRITE # ifndef SV_COW_THRESHOLD # define SV_COW_THRESHOLD 0 /* min string length for cow */ @@ -8154,6 +8156,10 @@ Perl_sv_gets(pTHX_ SV *const sv, PerlIO *const fp, ```

I32 append) (*fp)->_cnt++; #endif

+ if (!SvLEN(sv)) { + SvGROW(sv\,PERL_PREALLOC_EMPTY_IN_SV_GETS); + } + /* Here is some breathtakingly efficient cheating */

cnt = PerlIO_get_cnt(fp); /* get count into register */

-- perl -Mre=debug -e "/just|another|perl|hacker/"

p5pRT commented 10 years ago

From @tsee

On 05/09/2014 03:00 AM\, demerphq wrote:

Thanks\, this gave me the clue. See branch yves/cow_threshold_in_sv_c for a possible fix.

BTW\, setting PERL_PREALLOC_EMPTY_IN_SV_GETS to a higher number produces a measurable slowdown in dumbbench.

The code probably needs a comment too.

Oh\, wow. Not to take away the achievement of having figured this nasty out\, but this is a _workaround_ if I've ever seen one. :(

I've only scratched the surface\, but I think this is really multiple bugs (or unfortunate bits of behavior) flying in close formation. Whatever allocates "the same amount of memory as there is in the current buffer" clearly needs its head adjusted. I'm sure that's intended as an optimization in some situations\, but that'll need very careful reassessment.

The downside of preallocating a buffer of some arbitrary (small) size is that it's likely going to mean an additional malloc and that hurts.

--Steffen

p5pRT commented 10 years ago

From @demerphq

On 9 May 2014 08:35\, Steffen Mueller \smueller@cpan\.org wrote:

On 05/09/2014 03:00 AM\, demerphq wrote:

Thanks\, this gave me the clue. See branch yves/cow_threshold_in_sv_c for a possible fix.

BTW\, setting PERL_PREALLOC_EMPTY_IN_SV_GETS to a higher number produces a measurable slowdown in dumbbench.

The code probably needs a comment too.

Oh\, wow. Not to take away the achievement of having figured this nasty out\, but this is a _workaround_ if I've ever seen one. :(

Yeah\, I guess so. I think the alternative is a complete rewrite of the relevant part of sv_gets().

Which I am actually game for\, and started at some point (to support regex $/)\, but which is not going to be ready in time for 5.20

I've only scratched the surface\, but I think this is really multiple bugs (or unfortunate bits of behavior) flying in close formation. Whatever allocates "the same amount of memory as there is in the current buffer" clearly needs its head adjusted. I'm sure that's intended as an optimization in some situations\, but that'll need very careful reassessment.

I was a bit tired when I wrote that. It actually allocates the same amount as is *left* in the read buffer. And the code has the assumption that this is done built in deep into the logic.

The downside of preallocating a buffer of some arbitrary (small) size is that it's likely going to mean an additional malloc and that hurts.

Well when the alternative is what it is now\, I think that is an acceptable burden until 5.22 .

For what its worth I think the code in sv_gets() was written from the point of view that it would mostly affect only $_ which would preserve the allocations over time\, so that preallocating a big buffer would be a net speedup. When we starting *dropping* the COW'ed data we fell into an "odd" codepath in sv_gets()\, and ended up doing this weird proportional allocation.

So I definitely agree there is a mess of weird interactions here.

However from what i can tell the way it works with my patch is pretty close to how it worked pre-COW.

So I think we are good from now.

Sorry if this is a bit stream of consciousness\, I didnt get a lot of sleep last night. :-)

Yves

-- perl -Mre=debug -e "/just|another|perl|hacker/"

p5pRT commented 10 years ago

From @tsee

On 05/09/2014 10:08 AM\, demerphq wrote:

So I definitely agree there is a mess of weird interactions here.

However from what i can tell the way it works with my patch is pretty close to how it worked pre-COW.

So I think we are good from now.

Don't get me wrong. I'll take your patch over the current broken behaviour for 5.20 any day!

--Steffen

p5pRT commented 10 years ago

From @Leont

On Fri\, May 9\, 2014 at 10:08 AM\, demerphq \demerphq@gmail\.com wrote:

Yeah\, I guess so. I think the alternative is a complete rewrite of the relevant part of sv_gets().

Which I am actually game for\, and started at some point (to support regex $/)\, but which is not going to be ready in time for 5.20

I had similar plans for slightly different reasons. Part of the logic really belongs in PerlIO\, not having a low-level readline really bites us in a few tricky corner cases. Combining this with regexps is a bit of a complication though.

For what its worth I think the code in sv_gets() was written from the point of view that it would mostly affect only $_ which would preserve the allocations over time\, so that preallocating a big buffer would be a net speedup. When we starting *dropping* the COW'ed data we fell into an "odd" codepath in sv_gets()\, and ended up doing this weird proportional allocation.

So I definitely agree there is a mess of weird interactions here.

There is also the assumption that an extra read through the buffer is more expensive than the malloc costs. This may or may not be warranted.

Leon

p5pRT commented 10 years ago

From @iabyn

On Fri\, May 09\, 2014 at 10:21:35AM +0200\, Steffen Mueller wrote:

On 05/09/2014 10:08 AM\, demerphq wrote:

So I definitely agree there is a mess of weird interactions here.

However from what i can tell the way it works with my patch is pretty close to how it worked pre-COW.

So I think we are good from now.

Don't get me wrong. I'll take your patch over the current broken behaviour for 5.20 any day!

Unfortunately I don't think the patch is sufficient. If the line being read is longer than the PVX buffer size (20 in this vase)\, it still does a silly grow.

For a file full of 76 char char lines\, and with your cow_threshold_in_sv_c branch\, I get with this code:

open my $fh\, "\<"\, "/tmp/s" or die; my @got; while (\<$fh>) { push @got\, $_; } use Devel::Peek;

for my $i (0..20) { Dump $got[$i]; }

$ head /tmp/s aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa baaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa caaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa daaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa daaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa daaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa daaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa daaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa daaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa daaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa

$ wc -l /tmp/s 1003 /tmp/s

$ ./perl -Ilib ~/tmp/p 2>&1 | grep LEN LEN = 117 LEN = 8120 LEN = 8048 LEN = 7968 LEN = 7888 LEN = 7816 LEN = 7736 LEN = 7664 LEN = 7584 LEN = 7504 LEN = 7432 LEN = 7352 LEN = 7272 LEN = 7200 LEN = 7120 LEN = 7048 LEN = 6968 LEN = 6888 LEN = 6816 LEN = 6736 LEN = 6656

I still suspect that it may be better to (also) hack the COW/swipe decision code in sv_setsv() to prefer a copy over a COW if there is a lot of wasted space in the buffer. A condition something like\, say

SvLEN(sv) > 16 && SvLEN(sv) > 2*SvCUR(sv)

This would just be making perl revert to pre-COW behaviour for certain classes of strings\, and is therefore very conservative and thus likely safe for 5.20. We can always tweak this post 5.20.

I intend to spend today looking at this issue further (i.e. daytime\, GMT)\, and try and understand the issues better. If anyone else plans to work on it as well during this time\, let me know so we can coordinate.

-- Red sky at night - gerroff my land! Red sky at morning - gerroff my land! -- old farmers' sayings #14

p5pRT commented 10 years ago

From @demerphq

On 9 May 2014 12:46\, Dave Mitchell \davem@iabyn\.com wrote:

On Fri\, May 09\, 2014 at 10:21:35AM +0200\, Steffen Mueller wrote:

On 05/09/2014 10:08 AM\, demerphq wrote:

So I definitely agree there is a mess of weird interactions here.

However from what i can tell the way it works with my patch is pretty close to how it worked pre-COW.

So I think we are good from now.

Don't get me wrong. I'll take your patch over the current broken behaviour for 5.20 any day!

Unfortunately I don't think the patch is sufficient. If the line being read is longer than the PVX buffer size (20 in this vase)\, it still does a silly grow.

I agree its not great. I think maybe the threshold should be raised.

For a file full of 76 char char lines\, and with your cow_threshold_in_sv_c branch\, I get with this code:
open my $fh\, "\<"\, "/tmp/s" or die;
my @&#8203;got;
while $\<$fh>$ \{
    push @&#8203;got\, $\_;
\}
use Devel&#8203;::Peek;

for my $i $0\.\.20$ \{
    Dump $got\[$i\];
\}
$ head /tmp/s

aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa

baaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa

caaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa

daaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa

daaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa

daaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa

daaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa

daaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa

daaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa

daaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa

$ wc -l /tmp/s 1003 /tmp/s

$ ./perl -Ilib ~/tmp/p 2>&1 | grep LEN LEN = 117 LEN = 8120 LEN = 8048 LEN = 7968 LEN = 7888 LEN = 7816 LEN = 7736 LEN = 7664 LEN = 7584 LEN = 7504 LEN = 7432 LEN = 7352 LEN = 7272 LEN = 7200 LEN = 7120 LEN = 7048 LEN = 6968 LEN = 6888 LEN = 6816 LEN = 6736 LEN = 6656

I still suspect that it may be better to (also) hack the COW/swipe decision code in sv_setsv() to prefer a copy over a COW if there is a lot of wasted space in the buffer. A condition something like\, say
SvLEN$sv$ > 16 && SvLEN$sv$ > 2\*SvCUR$sv$
This would just be making perl revert to pre-COW behaviour for certain classes of strings\, and is therefore very conservative and thus likely safe for 5.20. We can always tweak this post 5.20.

I intend to spend today looking at this issue further (i.e. daytime\, GMT)\, and try and understand the issues better. If anyone else plans to work on it as well during this time\, let me know so we can coordinate.

I did this last night but apparently forgot to push it:

$ git log -1 origin/yves/check_cow_threshold commit 94992f55ce1dc3548a6fc4bbffedfa1266c2e230 Author: Yves Orton \demerphq@gmail\.com Date: Fri May 9 00:34:38 2014 +0200

check_cow_threshold

However I think it is also the wrong fix\, in that I dont see why strings like this should not be COWed\, although maybe it together with my other patch is a usable workaround.

The problem seems to really be related to *uncowing* a string.

A strategy where the "original owner" of the string got to keep its string\, and everybody else got properly sized copies would avoid all this. Instead we do this weird "the owner drops its ownership of the string and the sharer gets its copy" thing which leads to this overallocation. This works against any code that was to preallocate a buffer and then write into it to save a bunch of mallocs.

I was also thinking that we really could use a "DONT COW THIS SV" flag so that we could mark a scalar like $_ as such.

Also I think the behavior of sv_gets() is not perfect. Maybe it made sense at one point\, but I dont think it makes much sense now. At the very least it is suboptimal for delimiters longer than one byte.

We need to bisect when this behavior of setting the size of $_ to the size of the remaining buffer was introduced.

Yves

-- perl -Mre=debug -e "/just|another|perl|hacker/"

p5pRT commented 10 years ago

From @demerphq

Thoughts on sv_gets() and COW.

sv_gets() implements readline\, and sets from \<>. It handles the various interesting cases of $/ by delegation\, and handles the simple case of delimited lines itself. It has two implementations of this logic\, one a "fast" set which "cheats" the IO system and processes block by block by direct pointer manipulation\, and another a slow system intended for IO styles where there is no read ahead buffer. This mail is about the fast branch of sv_gets().

The partof sv_gets() that implements "find a delimiter" logic starts with the comment

/* Here is some breathtakingly efficient cheating */

The uninitiated then finds a dense and nearly impenetrable set of logic\, with seemingly no structure.

Being uninitiated I thought I would try to rewrite it to be friendly to COW. :-)

This was a useful if perhaps unfruitful exercise as it taught me that the problem is more complex than it appeared to me at first\, and what the current solution does is vastly more clever than meets the eye.

First off\, resist the tendency to think about the problem in terms of line endings like "\n". This is a mistake. Delimiters with length of 1 are a special simple case that because of their size side step the real challenge at hand which is how to identify long delimiters (2 or more chars) which cross a read buffer boundary.

The strategy that sv_gets() uses is to preallocate the target SV with as many bytes as could be read from the file buffer (via SvGROW)\, and then it scans forward\, copying as it goes\, looking for the char that matches the last character in the delimiter. If it has just copied the last character then it checks the relevant last K bytes of the target SV\, and if it matches it stops the copy/scan\, if it runs out of buffer it simply does another grow/read and continues the pattern. (It is smart enough however that when the string contains unused space to first try to fill that before reallocing\, if the string fits then no realloc is done at all.)

This cleverness of this strategy is that Perl performs relatively few realloc calls on the SV's pv\, and by using the target SV's pv for checking long delimiters it side steps any issues of multi-byte characters crossing buffer boundaries\, and the same code handles single and multi-byte delimiters more or less as efficiently as each other.

The problem however is that it *really* wants its target strings to have enough room\, and it expects that the buffer is reusable over subsequent calls. (There is even an "append" interface internally).

This last part plays really badly with how COW is currently implemented. The way that COW works is that each string has a single byte refcount at its tail\, and sharing is achieved by all users having a pointer to the same pv. Sharing and unsharing is by changing refcounts and copying or overwriting a pv. A byproduct of this is that "ownership" of the pv is "last-writer-owns-the-first-copy".

Consider the following code:

while (\<>) { push @got\, $_; }

What this code does is first load $_ with sv_gets()\, and lets assume that $_ ends up holding a long string\, say 100 characters\, inside of a 8092 byte allocated buffer.

Then the push @got\, $_ results in $got[0] and $_ containing a COW'ed pv. When on the next read $_ is assigned to again\, its copy of the PV is dropped\, so now $got[0] "owns" the string alone\, and sv_gets() has to allocate a new 8192 byte pv for the SV. A workaround is to ensure that an SV with no pv is pre-initialized to a certain size\, but this only moves the problem around\, short strings still overallocate\, and long strings still trigger the full buffer alloc. Changing the algorithm to read in smaller chunks would solve the latter problem but not the former.

Pre COW this problem does not come up because $got[0] ends up with a copy of $_\, properly sized\, and sv_gets() most of the time reuses whatever space it allocated in $_.

The most obvious short term fix for this is what Dave M proposed\, impose an upper bound on how much wastage can occur before a string is COWed. A combined scaling and hard threshold will avoid the worst of this kind of behaviour in sv_gets() and maybe elsewhere.

I wonder however if this is really just masking the deeper issue\, and that there aren't other ways to approach this issue. It occurs to me that if COW somehow respected "original ownership" then most previous expectations would be fulfilled. Another thought is that perhaps we need a way to flag an SV as unCOWable.

Yves

On 4 May 2014 12:08\, yves orton \perlbug\-followup@perl\.org wrote:

# New Ticket Created by yves orton # Please include the string: [perl #121796] # in the subject line of all future correspondence about this issue. # \<URL: https://rt-archive.perl.org/perl5/Ticket/Display.html?id=121796 >

The attached benchmark shows a marked performance degradation between earlier perls and 5.19\, I see approx 20% drop.

The performance drop has been bisected back to:

commit 13b0f67d12a6400f01bbc27945223ca68a12a7ef Author: David Mitchell \davem@iabyn\.com Date: Wed May 22 16:38:29 2013 +0100
re\-enable Copy\-on\-Write by default\.

COW was first introduced $and enabled by default$ in 5\.17\.7\.
It was disabled by default in 5\.17\.10\, because it was though to have
too many rough edges for the 5.18.0 release.
By re\-enabling it now\, early in the 5\.19\.x release cycle\, hopefully it
will be ready for production use by 5\.20\.

This commit mainly reverts 9f351b45f4 and e1fd41328c \(with
modifications)\, then updates perldelta.

-- perl -Mre=debug -e "/just|another|perl|hacker/"

-- perl -Mre=debug -e "/just|another|perl|hacker/"

p5pRT commented 10 years ago

From mail@steffen-mueller.net

On 05/11/2014 11:44 AM\, demerphq wrote:

Thoughts on sv_gets() and COW.

sv_gets() implements readline\, and sets from \<>. It handles the various interesting cases of $/ by delegation\, and handles the simple case of delimited lines itself. It has two implementations of this logic\, one a "fast" set which "cheats" the IO system and processes block by block by direct pointer manipulation\, and another a slow system intended for IO styles where there is no read ahead buffer. This mail is about the fast branch of sv_gets().

The partof sv_gets() that implements "find a delimiter" logic starts with the comment

/* Here is some breathtakingly efficient cheating */

The uninitiated then finds a dense and nearly impenetrable set of logic\, with seemingly no structure.

Being uninitiated I thought I would try to rewrite it to be friendly to COW. :-)

This was a useful if perhaps unfruitful exercise as it taught me that the problem is more complex than it appeared to me at first\, and what the current solution does is vastly more clever than meets the eye.

First off\, resist the tendency to think about the problem in terms of line endings like "\n". This is a mistake. Delimiters with length of 1 are a special simple case that because of their size side step the real challenge at hand which is how to identify long delimiters (2 or more chars) which cross a read buffer boundary.

The strategy that sv_gets() uses is to preallocate the target SV with as many bytes as could be read from the file buffer (via SvGROW)\, and then it scans forward\, copying as it goes\, looking for the char that matches the last character in the delimiter. If it has just copied the last character then it checks the relevant last K bytes of the target SV\, and if it matches it stops the copy/scan\, if it runs out of buffer it simply does another grow/read and continues the pattern. (It is smart enough however that when the string contains unused space to first try to fill that before reallocing\, if the string fits then no realloc is done at all.)

This cleverness of this strategy is that Perl performs relatively few realloc calls on the SV's pv\, and by using the target SV's pv for checking long delimiters it side steps any issues of multi-byte characters crossing buffer boundaries\, and the same code handles single and multi-byte delimiters more or less as efficiently as each other.

The problem however is that it *really* wants its target strings to have enough room\, and it expects that the buffer is reusable over subsequent calls. (There is even an "append" interface internally).

This last part plays really badly with how COW is currently implemented. The way that COW works is that each string has a single byte refcount at its tail\, and sharing is achieved by all users having a pointer to the same pv. Sharing and unsharing is by changing refcounts and copying or overwriting a pv. A byproduct of this is that "ownership" of the pv is "last-writer-owns-the-first-copy".

Consider the following code:

while (\<>) { push @got\, $_; }

What this code does is first load $_ with sv_gets()\, and lets assume that $_ ends up holding a long string\, say 100 characters\, inside of a 8092 byte allocated buffer.

Then the push @got\, $_ results in $got[0] and $_ containing a COW'ed pv. When on the next read $_ is assigned to again\, its copy of the PV is dropped\, so now $got[0] "owns" the string alone\, and sv_gets() has to allocate a new 8192 byte pv for the SV. A workaround is to ensure that an SV with no pv is pre-initialized to a certain size\, but this only moves the problem around\, short strings still overallocate\, and long strings still trigger the full buffer alloc. Changing the algorithm to read in smaller chunks would solve the latter problem but not the former.

Pre COW this problem does not come up because $got[0] ends up with a copy of $_\, properly sized\, and sv_gets() most of the time reuses whatever space it allocated in $_.

The most obvious short term fix for this is what Dave M proposed\, impose an upper bound on how much wastage can occur before a string is COWed. A combined scaling and hard threshold will avoid the worst of this kind of behaviour in sv_gets() and maybe elsewhere.

I wonder however if this is really just masking the deeper issue\, and that there aren't other ways to approach this issue. It occurs to me that if COW somehow respected "original ownership" then most previous expectations would be fulfilled. Another thought is that perhaps we need a way to flag an SV as unCOWable.

I'm really not a fan of yet another state bit on an SV. It's already all so twisted that

p5pRT commented 10 years ago

From @tsee

Somehow I think I just accidentally sent my reply long before I was done with it. Apologies.

On 05/11/2014 11:44 AM\, demerphq wrote:

The most obvious short term fix for this is what Dave M proposed\, impose an upper bound on how much wastage can occur before a string is COWed. A combined scaling and hard threshold will avoid the worst of this kind of behaviour in sv_gets() and maybe elsewhere.

I wonder however if this is really just masking the deeper issue\, and that there aren't other ways to approach this issue. It occurs to me that if COW somehow respected "original ownership" then most previous expectations would be fulfilled. Another thought is that perhaps we need a way to flag an SV as unCOWable.

I don't think another bit of state that needs to be taken into account all the time is a very good idea. Few really manage to use SVs safely as it stands.

How about changing the scanning logic to NOT copy as it goes but really mostly just do scanning a la strchr() and then memcpy()? An extra pass\, but then you _know_ it'll be exactly one malloc for the string buffer. I think that may often be more efficient and it will also be simpler.

--Steffen

p5pRT commented 10 years ago

From @Leont

On Sun\, May 11\, 2014 at 12:34 PM\, Steffen Mueller \smueller@cpan\.org wrote:

I don't think another bit of state that needs to be taken into account all the time is a very good idea. Few really manage to use SVs safely as it stands.

Maybe. I don't think we have flag bits available to the point is moot right now anyway.

How about changing the scanning logic to NOT copy as it goes but really

mostly just do scanning a la strchr() and then memcpy()? An extra pass\, but then you _know_ it'll be exactly one malloc for the string buffer. I think that may often be more efficient and it will also be simpler.

That's what I suggested earlier. It would make more sense to me too\, though it's still a trade-off; given the buffer should be in L1 cache I'd expect the extra read to be relatively cheap\, but that's just hand-waving. Also\, you don't always know how much to allocate\, in particular when a line crosses a buffer boundary; moving the strchr step into PerlIO would be a extremely helpful step there\, also for other more complicated reasons (#113424 for example).

Leon

p5pRT commented 10 years ago

From @tsee

On 05/11/2014 02:25 PM\, Leon Timmermans wrote:

On Sun\, May 11\, 2014 at 12:34 PM\, Steffen Mueller \<smueller@cpan.org \mailto:smueller@cpan\.org> wrote:
I don't think another bit of state that needs to be taken into
account all the time is a very good idea\. Few really manage to use
SVs safely as it stands\.
Maybe. I don't think we have flag bits available to the point is moot right now anyway.
How about changing the scanning logic to NOT copy as it goes but
really mostly just do scanning a la strchr and then memcpy? An
extra pass\, but then you \_know\_ it'll be exactly one malloc for the
string buffer\. I think that may often be more efficient and it will
also be simpler\.
That's what I suggested earlier. It would make more sense to me too\, though it's still a trade-off; given the buffer should be in L1 cache I'd expect the extra read to be relatively cheap\, but that's just hand-waving.

I think that's a tad optimistic. We're talking about string processing after all. I'd already be happy if most OP's were in L1. ;)

Also\, you don't always know how much to allocate\, in particular when a line crosses a buffer boundary; moving the strchr step into PerlIO would be a extremely helpful step there\, also for other more complicated reasons (#113424 for example).

Agreed.

--Steffen

p5pRT commented 10 years ago

From @bulk88

On Sun May 11 02:45:34 2014\, demerphq wrote:

The most obvious short term fix for this is what Dave M proposed\, impose an upper bound on how much wastage can occur before a string is COWed. A combined scaling and hard threshold will avoid the worst of this kind of behaviour in sv_gets() and maybe elsewhere.

I wonder however if this is really just masking the deeper issue\, and that there aren't other ways to approach this issue. It occurs to me that if COW somehow respected "original ownership" then most previous expectations would be fulfilled. Another thought is that perhaps we need a way to flag an SV as unCOWable.

Yves

Here is an idea\, realloc to SvCUR.

-- bulk88 ~ bulk88 at hotmail.com

p5pRT commented 10 years ago

From @rjbs

Yves noted that the oversizing of strings may be the cause of OOM in https://rt.perl.org/Ticket/Display.html?id=121826 and (I believe) https://rt-archive.perl.org/perl5/Ticket/Display.html?id=121815

-- rjbs

p5pRT commented 10 years ago

From [Unknown Contact. See original ticket]

Yves noted that the oversizing of strings may be the cause of OOM in https://rt.perl.org/Ticket/Display.html?id=121826 and (I believe) https://rt-archive.perl.org/perl5/Ticket/Display.html?id=121815

-- rjbs

p5pRT commented 10 years ago

From @iabyn

I wonder however if this is really just masking the deeper issue\, and that there aren't other ways to approach this issue. It occurs to me that if COW somehow respected "original ownership"

Note however that COW has different use cases which would imply different priorities for when to copy.

For example\, sometimes the original string is just a temporary value\, and a following assignment is where you want the value to end up:

$x = (string-expression);

Here\, the expression creates a temporary SV\, which we want to pass to $x without copying\, then free the temporary SV.

Conversely\, in something like

$s =~ /(.)/;

we likely want to keep $s\, and temporarily make use of $1.

-- The optimist believes that he lives in the best of all possible worlds. As does the pessimist.

p5pRT commented 10 years ago

From @demerphq

On 12 May 2014 17:25\, Ricardo SIGNES via RT \perlbug\-comment@perl\.orgwrote:

Yves noted that the oversizing of strings may be the cause of OOM in https://rt-archive.perl.org/perl5/Ticket/Display.html?id=121826 and (I believe) https://rt-archive.perl.org/perl5/Ticket/Display.html?id=121815

We should know soon. The patch I pushed this morning should be run against the offending modules.

Yves

-- perl -Mre=debug -e "/just|another|perl|hacker/"

p5pRT commented 10 years ago

From @demerphq

On 11 May 2014 15:02\, Steffen Mueller \smueller@cpan\.org wrote:

On 05/11/2014 02:25 PM\, Leon Timmermans wrote:
On Sun\, May 11\, 2014 at 12:34 PM\, Steffen Mueller \<smueller@cpan.org \mailto:smueller@cpan\.org> wrote:
I don't think another bit of state that needs to be taken into
account all the time is a very good idea\. Few really manage to use
SVs safely as it stands\.
Maybe. I don't think we have flag bits available to the point is moot right now anyway.
How about changing the scanning logic to NOT copy as it goes but
really mostly just do scanning a la strchr and then memcpy? An
extra pass\, but then you \_know\_ it'll be exactly one malloc for the
string buffer\. I think that may often be more efficient and it will
also be simpler\.
That's what I suggested earlier. It would make more sense to me too\, though it's still a trade-off; given the buffer should be in L1 cache I'd expect the extra read to be relatively cheap\, but that's just hand-waving.
I think that's a tad optimistic. We're talking about string processing after all. I'd already be happy if most OP's were in L1. ;)

Also\, you don't always know how much to allocate\, in

particular when a line crosses a buffer boundary; moving the strchr step into PerlIO would be a extremely helpful step there\, also for other more complicated reasons (#113424 for example).

Agreed.

Steffen and I accidentally continued discussing this offlist (the accidental part)\, and he really challenged me on my claim.

And I have to admit I was wrong.

After a bunch of benchmarking\, analysis\, and some late night hacking Steffen and i have concluded that we can speed up the code in sv_gets() by something like an order of magnitude (or so) by using a scan and then copy strategy.

I do not have an implementation yet\, but we definitely can do better than we do now in sv_gets().

It is on my todo list. More in the future. Thanks to Steffen for not believing me. Working with people like him makes people like me raise their game. :-)

Yves

-- perl -Mre=debug -e "/just|another|perl|hacker/"

p5pRT commented 10 years ago

From @Leont

On Sun\, May 11\, 2014 at 3:02 PM\, Steffen Mueller \smueller@cpan\.org wrote:

On 05/11/2014 02:25 PM\, Leon Timmermans wrote:

That's what I suggested earlier. It would make more sense to me too\, though it's still a trade-off; given the buffer should be in L1 cache I'd expect the extra read to be relatively cheap\, but that's just hand-waving.

I think that's a tad optimistic. We're talking about string processing after all. I'd already be happy if most OP's were in L1. ;)

I meant after the strchr()\, right after the buffer is accessed it I'd hope it to be cached\, specially if the cache is split by data/code. I may well be mistaken though.

Leon

p5pRT commented 10 years ago

From @rjbs

I've removed this from blocking 5.20.0.

Yves\, do you want to keep this ticket open for the further changes\, or make a new one?

-- rjbs

p5pRT commented 10 years ago

From [Unknown Contact. See original ticket]

I've removed this from blocking 5.20.0.

Yves\, do you want to keep this ticket open for the further changes\, or make a new one?

-- rjbs

p5pRT commented 10 years ago

From @demerphq

On 19 May 2014 16:42\, Ricardo SIGNES via RT \perlbug\-comment@perl\.orgwrote:

I've removed this from blocking 5.20.0.

Yves\, do you want to keep this ticket open for the further changes\, or make a new one?

I think we should start a new ticket. So please feel free to close this one.

Yves

p5pRT commented 10 years ago

From @rjbs

Closed per discussion with Yves.

-- rjbs

p5pRT commented 10 years ago

@rjbs - Status changed from 'open' to 'resolved'

p5pRT commented 10 years ago

From @cpansprout

On Mon May 12 09:22:19 2014\, demerphq wrote:

On 12 May 2014 17:25\, Ricardo SIGNES via RT \perlbug\-comment@perl\.orgwrote:

Yves noted that the oversizing of strings may be the cause of OOM in https://rt-archive.perl.org/perl5/Ticket/Display.html?id=121826 and (I believe) https://rt-archive.perl.org/perl5/Ticket/Display.html?id=121815

We should know soon. The patch I pushed this morning should be run against the offending modules.

Finding this whole discussion late\, I would like to thank Yves Orton and Dave Mitchell for diagnosing and fixing this. That’s quite a bit of detective work!

--

Father Chrysostomos

p5pRT commented 10 years ago

From [Unknown Contact. See original ticket]

On Mon May 12 09:22:19 2014\, demerphq wrote:

On 12 May 2014 17:25\, Ricardo SIGNES via RT \perlbug\-comment@perl\.orgwrote:

Yves noted that the oversizing of strings may be the cause of OOM in https://rt-archive.perl.org/perl5/Ticket/Display.html?id=121826 and (I believe) https://rt-archive.perl.org/perl5/Ticket/Display.html?id=121815

We should know soon. The patch I pushed this morning should be run against the offending modules.

Finding this whole discussion late\, I would like to thank Yves Orton and Dave Mitchell for diagnosing and fixing this. That’s quite a bit of detective work!

--

Father Chrysostomos

p5pRT commented 10 years ago

From @demerphq

On 23 August 2014 07:20\, Father Chrysostomos via RT \< perlbug-comment@perl.org> wrote:

On Mon May 12 09:22:19 2014\, demerphq wrote:

On 12 May 2014 17:25\, Ricardo SIGNES via RT \<perlbug-comment@perl.org wrote:

Yves noted that the oversizing of strings may be the cause of OOM in https://rt-archive.perl.org/perl5/Ticket/Display.html?id=121826 and (I believe) https://rt-archive.perl.org/perl5/Ticket/Display.html?id=121815

We should know soon. The patch I pushed this morning should be run against the offending modules.

Finding this whole discussion late\, I would like to thank Yves Orton and Dave Mitchell for diagnosing and fixing this. That’s quite a bit of detective work!

Now you are back I think there are issues raised in this thread that bear further discussion.

Specifically:

Should we have a "do not COW" flag?

Also\, I had a question about the implementation of the refcount data for COW/shared strings. Specifically the refcount is at the *end* of the string.

I was wondering if you had thought about doing something like the OOK strategy and putting it at the *head* of the string instead of at the tail? We could then store buf+sizeof(refcount) in the pv slot\, so to pretty much all code there would be no difference between a shared or unshared string\, the SvCUR/SvLEN data would be identical\, etc. The only place that would have to know about the difference would be code that does a realloc()\, which would have to be taught about the offset trick.

The reason I ask is twofold\, first there is a lack of locality of reference with the refcount being at the end of the string\, more operations on a string look at its first bytes than last\, and second the refcount is currently in user "visible" data at the end of the string. I can imagine dodgy XS code not aware off the refcount data deciding to use the full SvLEN of the string.

Just curious for your thoughts.

Yves

-- perl -Mre=debug -e "/just|another|perl|hacker/"

p5pRT commented 10 years ago

From @cpansprout

On Sun Aug 24 04:17:31 2014\, demerphq wrote:

Now you are back I think there are issues raised in this thread that bear further discussion.

Specifically:

Should we have a "do not COW" flag?

It doesn’t really matter to me.

I think that in each case where it seems that COW shouldn’t happen\, it’s because the COW algorithm needs adjustment. Fixing the algorithm for one particular inefficient case should also fix it for other similar cases.

That said\, I would not be opposed to such a flag. (BTW\, SVf_READONLY already means that\, but I assume you want a flag for mutable strings.) We have plenty of room in sv_flags. You just have to look closely to see it. :-) Specifically\, SVf_FAKE is unused on strings now and SVf_OOK|SVf_IsCOW is an unused combination. SVf_AMAGIC is free for non-HVs.

Also\, I had a question about the implementation of the refcount data for COW/shared strings. Specifically the refcount is at the *end* of the string.

I was wondering if you had thought about doing something like the OOK strategy and putting it at the *head* of the string instead of at the tail? We could then store buf+sizeof(refcount) in the pv slot\, so to pretty much all code there would be no difference between a shared or unshared string\, the SvCUR/SvLEN data would be identical\, etc. The only place that would have to know about the difference would be code that does a realloc()\, which would have to be taught about the offset trick.

The whole point of this COW mechanism was to upgrade existing strings to COW strings. That precludes using a byte at the beginning of the string\, because that byte is already in use.

If you are thinking of allocating most strings with an extra initial byte to begin with\, that could work\, but would effectively disable COW* until we comb through the entire codebase changing every instance of SvPVX_set to account. Is it worth that much trouble?

* Bear in mind that I have an innate tendency to exaggerate.

The reason I ask is twofold\, first there is a lack of locality of reference with the refcount being at the end of the string\, more operations on a string look at its first bytes than last\, and second the refcount is currently in user "visible" data at the end of the string. I can imagine dodgy XS code not aware off the refcount data deciding to use the full SvLEN of the string.

That is why I sent patches to many CPAN modules to stop them from diddling with SvPVX without calling sv_force_normal first. :-)

I hope that COW strings are common enough that such dodgy behaviour will result in test failures.

--

Father Chrysostomos

p5pRT commented 10 years ago

From @cpansprout

On Sun Aug 24 07:06:26 2014\, sprout wrote:

On Sun Aug 24 04:17:31 2014\, demerphq wrote:

I was wondering if you had thought about doing something like the OOK strategy and putting it at the *head* of the string instead of at the tail? We could then store buf+sizeof(refcount) in the pv slot\, so to pretty much all code there would be no difference between a shared or unshared string\, the SvCUR/SvLEN data would be identical\, etc. The only place that would have to know about the difference would be code that does a realloc()\, which would have to be taught about the offset trick.

The whole point of this COW mechanism was to upgrade existing strings to COW strings. That precludes using a byte at the beginning of the string\, because that byte is already in use.

How about SvLEN(sv)-- when turning on COWness? (And SvLEN(sv)++ when turning it off.)

--

Father Chrysostomos

p5pRT commented 10 years ago

From @rurban

Since you now realized the problem with COW\, maybe I can add my old argument again how it should have be done\, and all others are doing it.

The cow refcount needs to be in it's own xpv field. It should not be in the pv data. COW is used for shared readonly data. Changing readonly data is not possible.

The two usecases for COW are threads and the compiler. proper threads support is too futuristic to think about (it would need a full word for CAS then)\, but the compiler already uses shared readonly strings but needs to pessimize on the current COW strings\, as the current COW strings are not readonly. They change the ending byte.

p5pRT commented 10 years ago

From @demerphq

On 26 August 2014 15:49\, Reini Urban \rurban@x\-ray\.at wrote:

Since you now realized the problem with COW\,

Which problem? There were at least two I mentioned.

maybe I can add my old argument again how it should have be done\, and all others are doing it.

The cow refcount needs to be in it's own xpv field. It should not be in the pv data. COW is used for shared readonly data. Changing readonly data is not possible.

Which devolves down to some specific overhead for all strings.

I think you are right that there is a debate to be had about whether adding such a field is a better strategy\, but I dont think the case is clear-cut.

The two usecases for COW are threads and the compiler

Huh?

What do you mean by "*the* two uses cases"?

The uses cases I can think of for COW have nothing to do with threads or the compiler.

. proper threads support is too futuristic to think about (it would need a full word for CAS then)\, but the compiler already uses shared readonly strings but needs to pessimize on the current COW strings\, as the current COW strings are not readonly. They change the ending byte.

The current implementation is not about making COW strings read only. I agree there is a relationship but not a direct one.

Yves

-- perl -Mre=debug -e "/just|another|perl|hacker/"

p5pRT commented 10 years ago

From @iabyn

On Tue\, Aug 26\, 2014 at 08:49:25AM -0500\, Reini Urban wrote:

The cow refcount needs to be in it's own xpv field. It should not be in the pv data.

How would that work? Each SV has its own private XPV struct\, but a shared PV. If the refcount was stored in the XPV\, each SV would have its own private refcount\, that would quickly get out of sync.

Or are you suggesting instead that a single XPV should be shared between multiple SV heads?

-- In England there is a special word which means the last sunshine of the summer. That word is "spring".

Perl / perl5