Closed eserte closed 4 years ago
I don't know if it's the only cause of the slowdown, but I note that the first CN test in t/example-phone-numbers.t
requires Number::Phone::StubCountry::CN
, which is a 13MB source file that on my machine takes just over 2s to compile under v5.31.5, but over 100s under v5.31.6 (on a perl built with -O0).
Manual bisect points to 9f601cf3bb:
commit 9f601cf3bbfa6be3e2ab3468e77a7b79c80ff5cf (refs/bisect/bad)
Author: Tony Cook <tony@develop-help.com>
Date: Fri Oct 25 05:26:53 2019 +1100
Faster feature checks
Perform only a bit check instead of a much more expensive hash
lookup to test features.
[...]
I'm testing this with a build and install of perl, followed by build/install of Class-Inspector-1.34, inc-latest-0.500, File-ShareDir-Install-0.13 and File-ShareDir-1.116, then in the directory in which I've built Number-Phone-3.6001:
/usr/bin/time $PERL -Ilib -wle 'use Number::Phone::StubCountry::CN; 1'
I'll try to dig further - @tonycoz can you take a look as well?
For completeness I verified that building with -O6
has equivalent numbers:
# before 9f601cf3
1.23user 0.05system 0:01.29elapsed 100%CPU (0avgtext+0avgdata 253412maxresident)k
0inputs+0outputs (0major+61898minor)pagefaults 0swaps
# after 9f601cf3
119.26user 0.09system 1:59.35elapsed 99%CPU (0avgtext+0avgdata 255280maxresident)k
0inputs+0outputs (0major+62413minor)pagefaults 0swaps
I can confirm Hugo's findings for Linux.
$ thisperl -v |head -2 | tail -1
This is perl 5, version 31, subversion 6 (v5.31.6 (v5.31.5-111-g698f6cc7d5)) built for x86_64-linux
$ time thisperl -Ilib -wle 'use Number::Phone::StubCountry::CN; 1'
real 0m1.886s
user 0m1.814s
sys 0m0.072s
#####
$ thisperl -v | head -2 | tail -1
This is perl 5, version 31, subversion 6 (v5.31.6 (v5.31.5-112-g9f601cf3bb)) built for x86_64-linux
$ time thisperl -Ilib -wle 'use Number::Phone::StubCountry::CN; 1'
real 2m13.102s
user 2m13.001s
sys 0m0.100s
On 2/11/20 12:27 PM, James E Keenan wrote:
I can confirm Hugo's findings for Linux.
|$ thisperl -v |head -2 | tail -1 This is perl 5, version 31, subversion 6 (v5.31.6 (v5.31.5-111-g698f6cc7d5)) built for x86_64-linux $ time thisperl -Ilib -wle 'use Number::Phone::StubCountry::CN; 1' real 0m1.886s user 0m1.814s sys 0m0.072s ##### $ thisperl -v | head -2 | tail -1 This is perl 5, version 31, subversion 6 (v5.31.6 (v5.31.5-112-g9f601cf3bb)) built for x86_64-linux $ time thisperl -Ilib -wle 'use Number::Phone::StubCountry::CN; 1' real 2m13.102s user 2m13.001s sys 0m0.100s |
So I think the ANYOFR commit being blamed is incorrect. I have done some profiling and changing things around to not use it, and it doesn't look like it's involved at all.
It was commited a couple weeks after 9f601cf.
Let me try that table again, but quoting this time
-----------------------------------------------
0.00 0.00 22/3729502 Perl_newPVOP <cycle 1> [181]
0.00 0.00 77/3729502 Perl_newWHILEOP <cycle 1> [171]
0.00 0.00 132/3729502 Perl_newPADOP <cycle 1> [168]
0.01 0.00 182/3729502 Perl_newFOROP <cycle 1> [162]
0.02 0.00 535/3729502 Perl_newPMOP <cycle 1> [124]
0.08 0.00 2728/3729502 S_newMETHOP_internal <cycle 1> [49]
0.08 0.00 2845/3729502 Perl_alloc_LOGOP [47]
7.45 0.00 261747/3729502 Perl_newUNOP_AUX <cycle 1> [17]
7.64 0.00 268519/3729502 Perl_newSTATEOP <cycle 1> [16]
7.80 0.00 274226/3729502 Perl_newUNOP <cycle 1> [15]
15.20 0.00 534185/3729502 Perl_newLISTOP <cycle 1> [12]
15.29 0.00 537356/3729502 Perl_newOP <cycle 1> [11]
22.42 0.00 787948/3729502 Perl_newBINOP <cycle 1> [10]
30.13 0.00 1059000/3729502 Perl_newSVOP <cycle 1> [9]
[3] 97.2 106.11 0.00 3729502 Perl_Slab_Alloc [3]
0.00 0.00 13387/13387 S_new_slab [569]
-----------------------------------------------
[Edited by @hvds to replace single backtick with triple backtick]
I can confirm Hugo's findings for Linux.
$ thisperl -v |head -2 | tail -1 This is perl 5, version 31, subversion 6 (v5.31.6 (v5.31.5-111-g698f6cc7d5)) built for x86_64-linux $ time thisperl -Ilib -wle 'use Number::Phone::StubCountry::CN; 1' real 0m1.886s user 0m1.814s sys 0m0.072s ##### $ thisperl -v | head -2 | tail -1 This is perl 5, version 31, subversion 6 (v5.31.6 (v5.31.5-112-g9f601cf3bb)) built for x86_64-linux $ time thisperl -Ilib -wle 'use Number::Phone::StubCountry::CN; 1' real 2m13.102s user 2m13.001s sys 0m0.100s
And I was able to record similar differences in execution time on FreeBSD-11:
$ uname -mrs
FreeBSD 11.2-STABLE amd64
[perlmonger: Number-Phone-3.6001] $ $thisperl -v | head -2 | tail -1
This is perl 5, version 31, subversion 6 (v5.31.6 (v5.31.5-111-g698f6cc7d5)) built for amd64-freebsd-thread-multi
[perlmonger: Number-Phone-3.6001] $ /usr/bin/time $thisperl -Ilib -wle 'use Number::Phone::StubCountry::CN; 1'
2.98 real 2.78 user 0.16 sys
[perlmonger: Number-Phone-3.6001] $ $thisperl -v | head -2 | tail -1
This is perl 5, version 31, subversion 6 (v5.31.6 (v5.31.5-112-g9f601cf3bb)) built for amd64-freebsd-thread-multi
[perlmonger: Number-Phone-3.6001] $ /usr/bin/time $thisperl -Ilib -wle 'use Number::Phone::StubCountry::CN; 1'
110.18 real 109.80 user 0.23 sys
Oh, and here are some salient results of the profiling, on one entry on one test file in the distro
Can we get more detail, and a corresponding extract against perl before 9f601cf for comparison?
Cheers,
Hugo
That's the output just before the blamed commit. I don't know what details you want. It's clear to me that 97% of the time is being spent in mallocs; everything else is trivial by comparison.
My hunch is that most of the difference in performance between the two commits is associated with Number::Phone::StubCountry::CN.
If I clone t/example-phone-numbers.t
to a file named t/nocn-example-phone-numbers.t
, the comment out all the rows which appear to provide CN
test data ...
$ diff t/example-phone-numbers.t t/nocn-example-phone-numbers.t
725,739c725,739
< { class => 'Number::Phone::Lib', args => ['CN','+861012345678'], methods => ['is_fixed_line'] },
< { class => 'Number::Phone::Lib', args => ['CN','1012345678'], methods => ['is_fixed_line'] },
< { class => 'Number::Phone::Lib', args => ['+861012345678'], methods => ['is_fixed_line'] },
< { class => 'Number::Phone::Lib', args => ['CN','+8613123456789'], methods => ['is_mobile'] },
< { class => 'Number::Phone::Lib', args => ['CN','13123456789'], methods => ['is_mobile'] },
< { class => 'Number::Phone::Lib', args => ['+8613123456789'], methods => ['is_mobile'] },
< { class => 'Number::Phone::Lib', args => ['CN','+868001234567'], methods => ['is_tollfree'] },
< { class => 'Number::Phone::Lib', args => ['CN','8001234567'], methods => ['is_tollfree'] },
< { class => 'Number::Phone::Lib', args => ['+868001234567'], methods => ['is_tollfree'] },
< { class => 'Number::Phone::Lib', args => ['CN','+8616812345'], methods => ['is_specialrate'] },
< { class => 'Number::Phone::Lib', args => ['CN','16812345'], methods => ['is_specialrate'] },
< { class => 'Number::Phone::Lib', args => ['+8616812345'], methods => ['is_specialrate'] },
< { class => 'Number::Phone::Lib', args => ['CN','+864001234567'], methods => ['is_specialrate'] },
< { class => 'Number::Phone::Lib', args => ['CN','4001234567'], methods => ['is_specialrate'] },
< { class => 'Number::Phone::Lib', args => ['+864001234567'], methods => ['is_specialrate'] },
---
> #{ class => 'Number::Phone::Lib', args => ['CN','+861012345678'], methods => ['is_fixed_line'] },
> #{ class => 'Number::Phone::Lib', args => ['CN','1012345678'], methods => ['is_fixed_line'] },
> #{ class => 'Number::Phone::Lib', args => ['+861012345678'], methods => ['is_fixed_line'] },
> #{ class => 'Number::Phone::Lib', args => ['CN','+8613123456789'], methods => ['is_mobile'] },
> #{ class => 'Number::Phone::Lib', args => ['CN','13123456789'], methods => ['is_mobile'] },
> #{ class => 'Number::Phone::Lib', args => ['+8613123456789'], methods => ['is_mobile'] },
> #{ class => 'Number::Phone::Lib', args => ['CN','+868001234567'], methods => ['is_tollfree'] },
> #{ class => 'Number::Phone::Lib', args => ['CN','8001234567'], methods => ['is_tollfree'] },
> #{ class => 'Number::Phone::Lib', args => ['+868001234567'], methods => ['is_tollfree'] },
> #{ class => 'Number::Phone::Lib', args => ['CN','+8616812345'], methods => ['is_specialrate'] },
> #{ class => 'Number::Phone::Lib', args => ['CN','16812345'], methods => ['is_specialrate'] },
> #{ class => 'Number::Phone::Lib', args => ['+8616812345'], methods => ['is_specialrate'] },
> #{ class => 'Number::Phone::Lib', args => ['CN','+864001234567'], methods => ['is_specialrate'] },
> #{ class => 'Number::Phone::Lib', args => ['CN','4001234567'], methods => ['is_specialrate'] },
> #{ class => 'Number::Phone::Lib', args => ['+864001234567'], methods => ['is_specialrate'] },
... and then run the CN-less file through a prove
built from a perl
at the BEFORE and AFTER commits, then there is little difference in execution time between the two perls.
before:
[perlmonger: Number-Phone-3.6001] $ echo $COMMIT
9f601cf3bbfa6be3e2ab3468e77a7b79c80ff5cf^
[perlmonger: Number-Phone-3.6001] $ time $THISPROVE -b t/nocn-example-phone-numbers.t
t/nocn-example-phone-numbers.t .. ok
All tests successful.
Files=1, Tests=3632, 6 wallclock secs ( 0.28 usr 0.02 sys + 5.61 cusr 0.34 csys = 6.25 CPU)
Result: PASS
6.38 real 5.99 user 0.36 sys
after:
$ echo $COMMIT
9f601cf3bbfa6be3e2ab3468e77a7b79c80ff5cf
[perlmonger: Number-Phone-3.6001] $ time $THISPROVE -b t/nocn-example-phone-numbers.t
t/nocn-example-phone-numbers.t .. ok
All tests successful.
Files=1, Tests=3632, 6 wallclock secs ( 0.27 usr 0.04 sys + 5.85 cusr 0.34 csys = 6.50 CPU)
Result: PASS
6.67 real 6.26 user 0.39 sys
But when I restore the CN data rows, the test execution time for the AFTER commit soars into multiple minutes.
Thank you very much. Jim Keenan
That's the output just before the blamed commit. I don't know what details you want. It's clear to me that 97% of the time is being spent in mallocs; everything else is trivial by comparison.
Well the question is why 97% of the time is being spent in mallocs, which is not immediately obvious from the original table alone.
So it looks like the top line changes before/after as:
index % time self children called name
0.00 0.00 59/84866 Perl_newPVOP <cycle 1> [853]
0.00 0.00 22/3729502 Perl_newPVOP <cycle 1> [181]
The headers don't make it hugely clear, but I assume this is saying that before, it called Perl_Slab_Alloc 84,866 times, of which 59 calls came from Perl_newPVOP. If that's correct, then the problem is not the speed of something at the low level, but the logic of something at the high level that has hugely changed the number of calls ending up in Slab_Alloc. If that's correct, then the calls occurring less than 100k times in your first table have not changed hugely, but the rest have: newUNOP_AUX, newSTATEOP, newUNOP, newLISTOP, newOP, newBINOP, newSVOP.
My first guess from that is that some optree is supposed to be cached but no longer is, so it is getting recreated many times. But the ratios of before and after are not consistent enough for it to be a single optree, so more likely a raft of different optrees each being recreated 10-200 times.
For details, I was hoping to understand how you generated this profile so I could better understand what we were looking at, and maybe reproduce it with variations myself.
Hugo
From profiling the code it's now spending a huge amount of time searching the free op list associated with a slab, for the ::CN case, Perl_Slab_Alloc() is being called ~3.6 million times in both f601cf3bb (before the change) and after the change, the problem is the time spent in Perl_Slab_Alloc() itself:
This is before the change:
and this is in blead:
after a lot of (way too much) consideration I can only think of one change - the size of a cop now had an added U32 for cop_features, so rather than finding an existing freed op that can be used, it searches the whole list before finally allocating a new one.
(I used callgrind and kcachegrind.)
I'll look at a fix tomorrow unless someone beats me to it.
Attempting to view those images I get secure connection failed with error code SSL_ERROR_NO_CYPHER_OVERLAP, trying from either firefox or chrome. Unfortunately neither browser shows what cyphers are offered by either side, so I'm not sure what I can do about that; I'll try contacting github support.
Update: I can see them fine if I s/https/http/
.
They show fine in Firefox on Windows 10 here.
@hvds - you'll still have to compare the two lists, but these two should be reliable:
Not the whole story if there's a proxy in the way.
On Wed, 12 Feb 2020 at 05:56, Tony Cook notifications@github.com wrote:
From profiling the code it's now spending a huge amount of time searching the free op list associated with a slab, for the ::CN case, Perl_Slab_Alloc() is being called ~3.6 million times in both f601cf3bb (before the change) and after the change, the problem is the time spent in Perl_Slab_Alloc() itself:
This is before the change:
[image: image] https://user-images.githubusercontent.com/166429/74303934-9608dc00-4daf-11ea-894f-e535fc9ab0a9.png
and this is in blead: [image: image] https://user-images.githubusercontent.com/166429/74303913-86899300-4daf-11ea-9404-bd5ae371a0d7.png
after a lot of consideration I can only think of one change - the size of a cop now had an added U32 for cop_features, so rather than finding an existing freed op that can be used, it searches the whole list before finally allocating a new one.
(I used callgrind and kcachegrind.)
Could it be that you need to change the sizes that we allocate slabs for now that the op-sizes have changed?
I seem to recall that we precreate a set of different slabs based on the expected sizes of the objects we create, if you have changed the size of opcodes then maybe we just need to change that initialization.
Hazy guessing here based on a bell ringing faintly.
Yves
-- perl -Mre=debug -e "/just|another|perl|hacker/"
@hvds - you'll still have to compare the two lists, but these two should be reliable:
* https://www.ssllabs.com/ssltest/analyze.html?d=user%2dimages.githubusercontent.com&latest * https://clienttest.ssllabs.com:8443/ssltest/viewMyClient.html
Not the whole story if there's a proxy in the way.
Thanks @richardleach, as far as I can see we do have compatible TLS protocol versions and cipher suites; and there certainly shouldn't be a proxy (but I wouldn't put it past BT). I've reported it to github, but since the workaround is easy I don't want to waste anyone else's time on it.
Could it be that you need to change the sizes that we allocate slabs for now that the op-sizes have changed? I seem to recall that we precreate a set of different slabs based on the expected sizes of the objects we create, if you have changed the size of opcodes then maybe we just need to change that initialization. Hazy guessing here based on a bell ringing faintly. Yves
No, ops are allocated from slabs specific to the CV.
From what I can tell from 238a4c30b3 the old slab allocator didn't try to re-use freed ops at all, and didn't try to separate the slabs by size.
It simply allocated a constant-sized block, allocated from the block until it didn't have enough space, at which point it allocated a new block.
Each op had a reference count on the slab block, which was used to free the block when all the ops it contains were freed.
The newer CV-based allocator works similarly except that it keeps a freed list per slab and searches for that before allocating from free space in the slab or allocating a new slab. It expands the size of the slab allocated (up to a maximum) so that small functions end up with a small slab, but large functions don't end up with the cost of a large number of small allocations.
I have a solution I'm working on, but I'm running into a bug I haven't been able to resolve.
My working branch is at https://github.com/tonycoz/perl5/tree/17555-faster-freed-ops-reuse it throws an assertion in some tests, but I don't know what I've changed to cause it.
If anyone else has the time, please take a look and let me know what stupid mistake I've made.
I have a solution I'm working on, but I'm running into a bug I haven't been able to resolve.
My working branch is at https://github.com/tonycoz/perl5/tree/17555-faster-freed-ops-reuse it throws an assertion in some tests, but I don't know what I've changed to cause it.
If anyone else has the time, please take a look and let me know what stupid mistake I've made.
If it helps, the (only cause of) failure in t/re/reg_mesg.t
reduces to:
% ./miniperl -e '/(?(??{}))/'
Unknown switch condition (?(...)) in regex; marked by <-- HERE in m/(?(? <-- HERE ?{}))/ at -e line 1.
miniperl: op.c:669: Perl_opslab_force_free: Assertion `savestack_count == slab->opslab_refcnt-1' failed.
Aborted (core dumped)
%
The t/re/overload.t
failure reduces to much the same thing: a pattern (generated via overload this time) that includes a deferred eval block but also has a regexp-compile-time failure, in this case "Eval-group not allowed at runtime".
The
t/re/overload.t
failure reduces to much the same thing: a pattern (generated via overload this time) that includes a deferred eval block but also has a regexp-compile-time failure, in this case "Eval-group not allowed at runtime".
@hvds Where are you seeing these test failures? In Tony's branch?
The
t/re/overload.t
failure reduces to much the same thing: a pattern (generated via overload this time) that includes a deferred eval block but also has a regexp-compile-time failure, in this case "Eval-group not allowed at runtime".@hvds Where are you seeing these test failures? In Tony's branch?
Yes, I built it and ran make minitest
.
Applying only the DEBUG_Sv support, I get:
% ./miniperl -DSv -e '/(?(??{}))/' 2>&1 | perl -ple 's/ at -e .*//; s{([0-9a-f]{12})}{"\$".($a{$1} //= ++$s)}ge'
allocated new op slab sz 0x40, $1, head slab $1
allocating op at $2, slab $1, head slab $1
allocating op at $3, slab $1, head slab $1
allocating op at $4, slab $1, head slab $1
allocating op at $5, slab $1, head slab $1
allocating op at $6, slab $1, head slab $1
allocating op at $7, slab $1, head slab $1
allocating op at $8, slab $1, head slab $1
allocated new op slab sz 0x80, $9, head slab $1
allocating op at $10, slab $9, head slab $1
allocating op at $11, slab $9, head slab $1
Unknown switch condition (?(...)) in regex; marked by <-- HERE in m/(?(? <-- HERE ?{}))/
freeing op at $8, slab $1, head slab $1
freeing op at $7, slab $1, head slab $1
freeing op at $4, slab $1, head slab $1
freeing op at $5, slab $1, head slab $1
freeing op at $6, slab $1, head slab $1
freeing op at $3, slab $1, head slab $1
freeing op at $2, slab $1, head slab $1
freeing op at $11, slab $9, head slab $1
freeing op at $10, slab $9, head slab $1
freeing slab $1
All the frees occur in an opslab_force_free()
call. I think the issue is that op_free()
relies on following the op_next and op_sibparent pointers even through already freed ops. The result is that after the patch, op_free($10)
, the last op to be freed (a LISTOP), does not actually end op hitting CALL_OPFREEHOOK()
and that's what's causing the extra refcount.
On entry to opfree the structure is like:
o = $10
next = $8
next = 0 /* before patch */
next = $4 /* after patch */
sibparent = $3
sibparent = 0
All the frees occur in an
opslab_force_free()
call. I think the issue is thatop_free()
relies on following the op_next and op_sibparent pointers even through already freed ops. The result is that after the patch,op_free($10)
, the last op to be freed (a LISTOP), does not actually end op hittingCALL_OPFREEHOOK()
and that's what's causing the extra refcount.On entry to opfree the structure is like:
o = $10 next = $8 next = 0 /* before patch */ next = $4 /* after patch */ sibparent = $3 sibparent = 0
That seems kind of broken, since the OP free list is using op_next in blead, even without my change.
My new branch (in #17583 seems to avoid the problem, and avoids most of the cost of adding to the free list that my original attempt had.
All the frees occur in an
opslab_force_free()
call. I think the issue is thatop_free()
relies on following the op_next and op_sibparent pointers even through already freed ops. The result is that after the patch,op_free($10)
, the last op to be freed (a LISTOP), does not actually end op hittingCALL_OPFREEHOOK()
and that's what's causing the extra refcount. On entry to opfree the structure is like:o = $10 next = $8 next = 0 /* before patch */ next = $4 /* after patch */ sibparent = $3 sibparent = 0
That seems kind of broken, since the OP free list is using op_next in blead, even without my change.
My new branch (in #17583 seems to avoid the problem, and avoids most of the cost of adding to the free list that my original attempt had.
Building a perl from that branch and installing Number::Phone against it ... now we once again have a reasonable time for running make test
:
[Number-Phone-3.6001] 570 $ make test
Skip blib/lib/auto/share/dist/Number-Phone/Number-Phone-UK-Data.db (unchanged)
Skip blib/lib/auto/share/dist/Number-Phone/Number-Phone-NANP-Data.db (unchanged)
PERL_DL_NONLAZY=1 "/home/jkeenan/testing/smoke-me/jkeenan/tonyc/ghpr-17583/bin/perl" "-MExtUtils::Command::MM" "-MTest::Harness" "-e" "undef *Test::Harness::Switches; test_harness(0, 'blib/lib', 'blib/arch')" t/*.t
t/bugfix-rt48581.t .................................. ok
t/bugfix-rt61177.t .................................. ok
t/bugfix_N-P-New-CC-number.t ........................ ok
t/country-for-subclass.t ............................ ok
t/country-test-ar.t ................................. ok
t/country-test-at.t ................................. ok
t/country-test-br.t ................................. ok
t/country-test-sm.t ................................. ok
t/dial_to.t ......................................... ok
t/example-phone-numbers.t ........................... ok
t/find-right-subclass.t ............................. ok
t/formatters.t ...................................... ok
t/libphonenumber.t .................................. ok
t/nanp.t ............................................ ok
t/no-stubs.t ........................................ ok
t/number-phone-country-idd.t ........................ ok
t/number-phone-country-multiple.t ................... ok
t/number-phone-country-prefix.t ..................... ok
t/number-phone-country-uk.t ......................... ok
t/number-phone-country-warnings.t ................... ok
t/number-phone-country.t ............................ ok
t/pod-coverage.t .................................... skipped: Test::Pod::Coverage 1.08 required for testing POD coverage
t/pod.t ............................................. skipped: Test::Pod 1.18 required for testing POD
t/stubs.t ........................................... ok
t/superclass_and_uk_different_method_invocations.t .. ok
t/uk_data.t ......................................... ok
t/uk_slurp.t ........................................ skipped: slurping is too slow so skipping under Devel::Cover and for normal installs, set AUTOMATED_TESTING to run this
t/unicode.t ......................................... ok
t/weird_little_islands.t ............................ ok
All tests successful.
Files=29, Tests=6255, 15 wallclock secs ( 0.58 usr 0.07 sys + 13.49 cusr 0.93 csys = 15.07 CPU)
Result: PASS
As a followup to @jkeenan's findings in https://github.com/Perl/perl5/issues/17553, here are the runtimes of Number-Phone-3.6001's test suite on my smoker systems (various freebsd and linux systems) sorted by perl version:
Since v5.31.6 the test suite is much slower.