Closed benlorenz closed 1 year ago
Unfortunately I won't be a lot of help on my own, though happy to do what I can with some pointers.
Is there a minimal test you could distill from this that I could use? At a minimum I can isolate which part of the PR may be responsible. Though I'd guess it's from storing MethodInstances in LineInfoNodes, somehow. But I may be a bit lost trying to figure out where in the compiler that's causing a problem.
I tried to find a minimal testcase, but the effect seems to disappear once I disable too many other tests. This somehow seems to depend on the amount of tests that are run before and the effect is a lot worse for the later tests:
For my previous test runs I had our container test groups like this, with SparseMatrix at the end taking very long:
master
:
Polymake.Vector | 1539 1539 1m36.9s
Polymake.SparseVector | 1704 1704 2m27.0s
Polymake.Matrix | 1667 1667 8m57.0s
Polymake.SparseMatrix | 1757 1757 10m54.7s
with reverts:
Polymake.Vector | 1539 1539 1m17.5s
Polymake.SparseVector | 1704 1704 1m00.8s
Polymake.Matrix | 1667 1667 4m26.0s
Polymake.SparseMatrix | 1757 1757 4m54.2s
While trying to reduce this I tried reordering the test groups and got this, SparseMatrix
only taking about 15% longer, but now Vector
taking 4 times as much time:
master
:
Polymake.SparseMatrix | 1757 1757 4m55.6s
Polymake.Matrix | 1667 1667 6m19.4s
Polymake.SparseVector | 1704 1704 3m29.8s
Polymake.Vector | 1539 1539 4m54.0s
with reverts:
Polymake.SparseMatrix | 1757 1757 4m14.1s
Polymake.Matrix | 1667 1667 3m55.8s
Polymake.SparseVector | 1704 1704 1m22.1s
Polymake.Vector | 1539 1539 1m12.5s
Is it possible that there are some caches, lookup tables, ... that are somehow getting large and slow?
I suppose we know all the slowdown is in here: https://github.com/JuliaLang/julia/blob/c4d162ea08e37dfdf958a79051aa6f6e223606b2/src/ircode.c#L767-L858
I wonder if the timing mechanism could be extended to narrow down which specific portions are slower?
Though an educated guess would be that the linetables have more information in them and, therefore, take longer to compress? (But I may be reading this wrong...) https://github.com/JuliaLang/julia/blob/c4d162ea08e37dfdf958a79051aa6f6e223606b2/src/ircode.c#L830-L842
That does fit my observations, it looks like it is spending quite some time in compressing the LineInfoNodes that now store the Methods / MethodInstances instead of the name.
I repeatedly hit the running process with USR1
and quite often I get backtraces that look like this:
signal (10): User defined signal 1
jl_egal_ at /home/lorenz/software/polymake/julia/julia/julia/src/julia.h:1493 [inlined]
literal_val_id at /home/lorenz/software/polymake/julia/julia/julia/src/ircode.c:82 [inlined]
jl_encode_as_indexed_root at /home/lorenz/software/polymake/julia/julia/julia/src/ircode.c:106
jl_encode_value_ at /home/lorenz/software/polymake/julia/julia/julia/src/ircode.c:327
jl_encode_value_ at /home/lorenz/software/polymake/julia/julia/julia/src/ircode.c:358
ijl_compress_ir at /home/lorenz/software/polymake/julia/julia/julia/src/ircode.c:814
maybe_compress_codeinfo at ./compiler/typeinfer.jl:360
transform_result_for_cache at ./compiler/typeinfer.jl:379 [inlined]
cache_result! at ./compiler/typeinfer.jl:405
_typeinf at ./compiler/typeinfer.jl:283
typeinf at ./compiler/typeinfer.jl:216
typeinf_edge at ./compiler/typeinfer.jl:930
abstract_call_method at ./compiler/abstractinterpretation.jl:629
abstract_call_gf_by_type at ./compiler/abstractinterpretation.jl:95
abstract_call_known at ./compiler/abstractinterpretation.jl:2046
...
I also managed to hit such a backtrace with gdb which gives a bit more information:
#0 0x00007f8a4d2e7c17 in jl_egal__unboxed_ (dtag=140231691781904, b=0x7f89bc6e5500, a=0x7f8a428042e0)
at /home/lorenz/software/polymake/julia/julia/julia/src/julia.h:1486
#1 jl_egal_ (b=0x7f89bc6e5500, a=0x7f8a428042e0)
at /home/lorenz/software/polymake/julia/julia/julia/src/julia.h:1498
#2 literal_val_id (v=0x7f89bc6e5500, s=<optimized out>, rr=0x7ffe9addf2b0)
at /home/lorenz/software/polymake/julia/julia/julia/src/ircode.c:82
#3 jl_encode_as_indexed_root (s=s@entry=0x7ffe9addf450, v=0x7f89bc6e5500)
at /home/lorenz/software/polymake/julia/julia/julia/src/ircode.c:106
#4 0x00007f8a4d2e9176 in jl_encode_value_ (s=s@entry=0x7ffe9addf450, v=<optimized out>,
as_literal=as_literal@entry=0) at /home/lorenz/software/polymake/julia/julia/julia/src/ircode.c:433
#5 0x00007f8a4d2e9630 in jl_encode_value_ (s=s@entry=0x7ffe9addf450, v=<optimized out>,
as_literal=as_literal@entry=0) at /home/lorenz/software/polymake/julia/julia/julia/src/ircode.c:327
#6 0x00007f8a4d2e9a6d in jl_encode_value_ (s=s@entry=0x7ffe9addf450, v=<optimized out>,
as_literal=as_literal@entry=1) at /home/lorenz/software/polymake/julia/julia/julia/src/ircode.c:358
#7 0x00007f8a4d2ea160 in ijl_compress_ir (m=0x7f8a3a0253d0 <jl_system_image_data+35479824>,
code=0x7f89b0d92380) at /home/lorenz/software/polymake/julia/julia/julia/src/ircode.c:814
#8 0x00007f8a378b30b8 in julia_maybe_compress_codeinfo_38174 () at compiler/typeinfer.jl:360
#9 0x00007f8a3786f8db in transform_result_for_cache () at compiler/typeinfer.jl:379
#10 julia_cache_result!_38191 () at compiler/typeinfer.jl:405
#11 0x00007f8a37b171ac in julia__typeinf_38127 () at compiler/typeinfer.jl:283
#12 0x00007f8a3730c257 in julia_typeinf_38095 () at compiler/typeinfer.jl:216
#13 0x00007f8a37943e12 in julia_typeinf_edge_38296 () at compiler/typeinfer.jl:930
#14 0x00007f8a37bede16 in julia_abstract_call_method_36753 () at compiler/abstractinterpretation.jl:629
#15 0x00007f8a37705b73 in julia_abstract_call_gf_by_type_36579 ()
at compiler/abstractinterpretation.jl:95
#16 0x00007f8a372143cc in julia_abstract_call_known_37367 () at compiler/abstractinterpretation.jl:2046
#17 0x00007f8a375bb2b2 in julia_abstract_call_37520 () at compiler/abstractinterpretation.jl:2122
#18 0x00007f8a3786e7e7 in julia_abstract_call_37602 () at compiler/abstractinterpretation.jl:2096
#19 0x00007f8a3730b88e in julia_abstract_call_37511 () at compiler/abstractinterpretation.jl:2279
#20 0x00007f8a375bbaef in julia_abstract_eval_call_37694 () at compiler/abstractinterpretation.jl:2299
Frame 7 is the ijl_compress_ir
mentioned earlier, we are in line 814 which encodes the CodeInfo, and currently i=5
, which should correspond to the linetable.
The next frame just unpacks that array.
Frame 5 is for encoding the LineInfoNode, and we are at i=1
which corresponds to the second field where the method / methodinstance should be:
324 else if (jl_typetagis(v, jl_lineinfonode_type)) {
325 write_uint8(s->s, TAG_LINEINFO);
326 for (i = 0; i < jl_datatype_nfields(jl_lineinfonode_type); i++)
327 jl_encode_value(s, jl_get_nth_field(v, i));
328 }
And then for this encoding (via jl_encode_as_indexed_root
) we run into literal_val_id
where we seem to compare such a MethodInstance with all method roots via jl_egal
:
#1 literal_val_id (v=0x7f89bb4343d0, s=<optimized out>, rr=0x7ffe9addafd0)
at /home/lorenz/software/polymake/julia/julia/julia/src/ircode.c:82
70 static void literal_val_id(rle_reference *rr, jl_ircode_state *s, jl_value_t *v) JL_GC_DISABLED
71 {
72 jl_array_t *rs = s->method->roots;
73 int i, l = jl_array_len(rs);
74 if (jl_is_symbol(v) || jl_is_concrete_type(v)) {
75 for (i = 0; i < l; i++) {
76 if (jl_array_ptr_ref(rs, i) == v)
77 return tagged_root(rr, s, i);
78 }
79 }
80 else {
81 for (i = 0; i < l; i++) {
82 if (jl_egal(jl_array_ptr_ref(rs, i), v))
83 return tagged_root(rr, s, i);
84 }
85 }
86 jl_add_method_root(s->method, jl_precompile_toplevel_module, v);
87 return tagged_root(rr, s, jl_array_len(rs) - 1);
88 }
(gdb) call jl_(jl_typeof(v))
Core.MethodInstance
(gdb) call jl_(v)
(::Type{CxxWrap.CxxWrapCore.ConstCxxRef{Polymake.LibPolymake.SparseVector{Polymake.LibPolymake.Rational}}})(Ptr{Nothing}) from (::Type{CxxWrap.CxxWrapCore.ConstCxxRef{T}})(Ptr{T} where T) where {T}
(gdb) p s->method->roots->length
$13 = 33340
This second loop seems to take quite a while, there are also quite a lot of roots. Without the change this field might have been a symbol and used the first (faster?) loop?
I wonder if the timing mechanism could be extended to narrow down which specific portions are slower?
Looks like it can, I added X(IRCOMP_LITVAL)
to the timing subsystems in timing.h, added JL_TIMING(IRCOMP_LITVAL, IRCOMP_LITVAL);
to the above function and got:
AST_COMPRESS , 19918970129 ( 0.55 %), 1487756848655 (41.14 %)
...
IRCOMP_LITVAL , 1467837878526 (40.59 %), 1467837878526 (40.59 %)
So if this works as expected, almost all the time inside AST_COMPRESS seems to be spent in this literal_val_id
function.
PS: Line numbers correspond to commit f407a4cac3d.
Edit:
For comparison I added the same timing statements to the build which has the reverted commits and I get just 2% in the new IRCOMP_LITVAL
:
Event , Self Cycles (% of Total) , Total Cycles (% of Total)
ADD_METHOD , 3575935343 ( 0.16 %), 3767675666 ( 0.17 %)
AST_COMPRESS , 13102464145 ( 0.58 %), 61441313977 ( 2.74 %)
AST_UNCOMPRESS , 14145984302 ( 0.63 %), 14348582309 ( 0.64 %)
CODEGEN_Codeinst , 91633141 ( 0.00 %), 66084549691 ( 2.95 %)
CODEGEN_LLVM , 64617649983 ( 2.88 %), 65505328655 ( 2.92 %)
CODEGEN_Workqueue , 134335623 ( 0.01 %), 7532002742 ( 0.34 %)
CODEINST_COMPILE , 1145205447 ( 0.05 %), 1945422025519 (86.79 %)
DL_OPEN , 100072739 ( 0.00 %), 100072739 ( 0.00 %)
GC , 45712481 ( 0.00 %), 14239474721 ( 0.64 %)
GC_Finalizers , 259778430 ( 0.01 %), 259828974 ( 0.01 %)
GC_FullSweep , 107974667 ( 0.00 %), 107974667 ( 0.00 %)
GC_IncrementalSweep , 6800313932 ( 0.30 %), 6800313932 ( 0.30 %)
GC_Mark , 7025563210 ( 0.31 %), 7025563210 ( 0.31 %)
GC_Stop , 81457 ( 0.00 %), 81457 ( 0.00 %)
INFERENCE , 86722293188 ( 3.87 %), 195156770864 ( 8.71 %)
INIT_MODULE , 64294881505 ( 2.87 %), 86200640029 ( 3.85 %)
IRCOMP_LITVAL , 48338849832 ( 2.16 %), 48338849832 ( 2.16 %)
JULIA_INIT , 67818157 ( 0.00 %), 231329978 ( 0.01 %)
LLVM_OPT , 1120071637440 (49.97 %), 1120071637440 (49.97 %)
LLVM_ORC , 758003630861 (33.82 %), 1878075268301 (83.78 %)
LOAD_Pkgimg , 848177602 ( 0.04 %), 7388993626 ( 0.33 %)
LOAD_Processor , 1976376 ( 0.00 %), 1976376 ( 0.00 %)
LOAD_Require , 221449806 ( 0.01 %), 94751808669 ( 4.23 %)
LOAD_Sysimg , 131339745 ( 0.01 %), 131339745 ( 0.01 %)
LOWERING , 5704511976 ( 0.25 %), 6833292884 ( 0.30 %)
MACRO_INVOCATION , 25892596 ( 0.00 %), 1128780908 ( 0.05 %)
METHOD_LOOKUP_FAST , 3149306459 ( 0.14 %), 4047836294 ( 0.18 %)
METHOD_LOOKUP_SLOW , 38612337 ( 0.00 %), 828309377 ( 0.04 %)
METHOD_MATCH , 17309173074 ( 0.77 %), 17875621316 ( 0.80 %)
PARSING , 745536025 ( 0.03 %), 745536025 ( 0.03 %)
ROOT , 20903426138 ( 0.93 %), 2241613887403 (100.00 %)
STAGED_FUNCTION , 2404728 ( 0.00 %), 514411583 ( 0.02 %)
TYPE_CACHE_INSERT , 600301181 ( 0.03 %), 600301181 ( 0.03 %)
TYPE_CACHE_LOOKUP , 2905647622 ( 0.13 %), 2905647622 ( 0.13 %)
VERIFY_Edges , 345359034 ( 0.02 %), 2150140787 ( 0.10 %)
VERIFY_Graph , 21222541 ( 0.00 %), 21222541 ( 0.00 %)
VERIFY_Methods , 7734280 ( 0.00 %), 7734280 ( 0.00 %)
That's awesome, thanks for poking into it! Love identifying a good quadratic-time algorithm to optimize.
Does Julia need to look up the root each time it's being compressed? If there are many copies of the same method instances, that's a lot of unnecessary repeated work. Can the root be stored with the method instance at an earlier point, so the lookup is only performed once?
Can the lookup operation be turned into a dictionary access rather than array iteration, so the lookup itself is constant-time?
@JeffBezanson @timholy - Since you've both most recently (2-3 years ago :-p) touched literal_val_id
, what are your impressions? Are either of my suggestions potentially workable, or are there obvious reasons those couldn't work that I'm not seeing?
Trying to gauge if this is feasible for me to tackle, or if it's a much bigger problem to untangle.
That array should never be long, since that can cause a myriad of other problems. How is that happening and how can that be avoided?
I added a little bit of code to print the maximal size for that array:
static int max = 0;
if (l > 1000 && l > max) {
max = l;
printf("rootmax: %d\n", max);
}
On master (more precisely f407a4cac, including #41099) our testsuite ends with:
rootmax: 111575
On my other build where I reverted #41099 (see my earlier posts for the precise commits) I get:
rootmax: 13894
13894 seems quite large as well? But I guess it is a lot better than 111575.
I really don't understand the details, I am just trying to debug what is happening here.
So adding the Methods / MethodInstances to those LineNumberNodes seems to trigger a lot more roots?
Was just exploring the same kind of testing.
When Julia is started for the first time after building with that PR (make cleanall
, deleted .julia
), literal_val_id
is called 1,702,882 times (compared with 109 on second run).
With the commit just before the PR merge, this count is 870,750.
EDIT: (The fact that it's almost exactly double seems suspicious.)
It appears that literal_val_id
is the only method that adds to a method's root table (via jl_add_method_root
).
So the MethodInstance
s being added to LineInfoNodes
aren't present in the parent Method
's root table, which results in them being added to the Method
's roots. Previously, they would all just map to the method name (as a Symbol
), and thus share the same root. I believe?
13894 seems quite large as well? But I guess it is a lot better than 111575.
Some more details from running our testsuite, inspired by the plots from #14556, number of times a specific roots array length is encountered in literal_val_id
:
Total number of calls to jl_egal
inside literal_val_id
:
master: 114202053625
w/o 41099: 3059875551
I also noticed that running tests of a package of mine on c58e508fdd takes sensibly longer than on v1.9, but haven't investigate what's the cause of the longer runtime.
Thanks for that plot - some other observations from Julia on first run:
# previous
calls to `literal_val_id`: 870,711
----where it's a symbol/concrete type: 758,439
----where it isn't: 112,272
new roots added: 81,344
maximum number of roots per method observed: 1,153
# PR 41099
calls to `literal_val_id`: 1,702,467
----where it's a symbol/concrete type: 661,561
----where it isn't: 1,040,906
new roots added: 226,637
maximum number of roots per method observed: 2,834
The Base method with the most roots after the PR is materialize(bc::Base.Broadcast.Broadcasted)
Before the PR it is iterate(g::Base.Generator, s...)
Again with a fresh Julia build/start:
Before the PR, materialize
was responsible for 197,722 calls to jl_egal
, out of 7,340,891 total (3%).
After the PR, materialize
is responsible for 16,344,836 calls to jl_egal
, out of 183,833,844 total (9%).
So... is adding these roots incorrect in some way, or just inconvenient for the current implementation? And particularly bad for certain generic methods that have numerous, possibly single-use, specializations?
Is there documentation of what roots
are designed to do somewhere, so we can better understand?
Generally speaking, it is incorrect, as these roots are supposed to be for syntax, not compiler-generated content. Do we need to revert that PR, or can we partly disable whatever part is breaking this array?
When you say the roots are for syntax, what does that mean?
The types of the roots of Base.materialize
in 1.9.0 are:
7-element Vector{DataType}:
Symbol
DataType
Core.MethodInstance
Module
UnionAll
ErrorException
String
Should Core.MethodInstance not be in there? (If that's what you mean by compiler-generated content?)
I'm looking back up the stack, in jl_encode_value_
- I notice three frames of that here. There's an else if
case for jl_lineinfonode_type
, which calls jl_encode_value_
again on each field.
The method roots get added in the fallback else
, via jl_encode_as_indexed_root
. So that is normally where MethodInstances end up.
We could add a case inside the jl_lineinfonode_type
case for the field to handle it differently. But then how would the MethodInstance be compressed without a root?
Looking more:
jl_compress_ir
is called from:
jl_emit_codeinst
, which is passed a jl_method_t
and a jl_code_instance_t
. jl_method_set_source
, which is passed a jl_method_t
and a jl_code_info_t
, which is copied first (though a shallow copy)strip_codeinfo_meta
, which is passed a jl_method_t
and a jl_code_info_t
, which is uncompressed first if necessary.maybe_compress_codeinfo
, which is passed a Method
and a CodeInfo
. This seems to be the path Ben is seeing blow up.CodeInfo objects contain the linetable, and a parent
MethodInstance
field. So currently, the parent
of each CodeInfo
would end up added to a Method's roots. But with my PR, the linetable being compressed is also adding the linetable MethodInstances to the method's roots when previously they would have just been Symbol
s.
(For the CodeInstance
case, it has a def
MethodInstance
field as well as an inferred
CodeInfo
field, so I think MethodInstance roots could also be added there?)
Looking at the roots of findfirst(ch::AbstractChar, string::AbstractString)
in 1.9.0:
8-element Vector{Any}:
:findfirst
Symbol("strings/search.jl")
Base.Fix2{typeof(==), Char}
typeof(==) (singleton type of function ==, subtype of Function)
MethodInstance for findnext(::Base.Fix2{typeof(==), Char}, ::String, ::Int64)
:findnext
:Fix2
:EachStringIndex
Here, a MethodInstance contained in a specialization of findfirst
, which I believe is coming from the code
field:
mi = methods(findfirst)[1].specializations[1]
ci = ccall(:jl_uncompress_ir, Any, (Any, Any, Any), mi.def, mi.cache, mi.cache.inferred)
ci.code
3-element Vector{Any}:
:(%new(Base.Fix2{typeof(==), Char}, ==, _2))
:($(Expr(:invoke, MethodInstance for findnext(::Base.Fix2{typeof(==), Char}, ::String, ::Int64), :(Base.findnext), :(%1), Core.Argument(3), 1)))
:(return %2)
Though this doesn't show up in the linetable:
ci.linetable
9-element Vector{Core.LineInfoNode}:
Core.LineInfoNode(Base, :findfirst, Symbol("strings/search.jl"), 140, 0)
Core.LineInfoNode(Base, :(==), Symbol("operators.jl"), 1147, 1)
Core.LineInfoNode(Base, :Fix2, Symbol("operators.jl"), 1121, 2)
Core.LineInfoNode(Base, :convert, Symbol("char.jl"), 187, 3)
Core.LineInfoNode(Base, :findfirst, Symbol("array.jl"), 2122, 1)
Core.LineInfoNode(Base, :keys, Symbol("strings/basic.jl"), 574, 5)
Core.LineInfoNode(Base, :EachStringIndex, Symbol("strings/basic.jl"), 572, 6)
Core.LineInfoNode(Base, :EachStringIndex, Symbol("strings/basic.jl"), 572, 7)
Core.LineInfoNode(Base, :convert, Symbol("essentials.jl"), 298, 8)
Is one issue that roots
is serving a dual purpose as a compression token table and something else?
It sounds like #32705 was trying to be a solution to this, but it is closed? Last comment from you @vtjnash suggested it just needed more work and is conflicted - have those conflicts resolved? Or are there other reasons that isn't a good approach?
Instructive to compare with the roots of this findfirst
method in 1.9.0 with the current PR:
vs. current master:
11-element Vector{Any}:
:findfirst
Symbol("strings/search.jl")
Base.Fix2{typeof(==), Char}
MethodInstance for findnext(::Base.Fix2{typeof(==), Char}, ::String, ::Int64)
:findnext
typeof(==) (singleton type of function ==, subtype of Function)
MethodInstance for ==(::Char)
MethodInstance for Base.Fix2(::typeof(==), ::Char)
MethodInstance for findfirst(::Base.Fix2{typeof(==), Char}, ::String)
MethodInstance for keys(::String)
MethodInstance for Base.EachStringIndex(::String)
Observations:
MethodInstance
sMethodInstance
s (e.g. keys
, ==
) weren't present originally as symbols - possibly due to finding it in the common symbol table via jl_lookup_common_symbol
in jl_encode_value_
, and thus skipping adding a root? Leading to extra itemsWhat if...
The names of the methods of LineInfoNodes
are stored in the Method
roots (current behavior) while any MethodInstance
references are stored in a separate mi_roots
table in the CodeInstance
or MethodInstance
?
Then the question would be where MethodInstances
that appear in CodeInfo.code
should go - do they stay in the Method
roots table, or go in this new mi_roots
table?
EDIT: After trying to implement something like this, learned of a flaw: in jl_decompress_ir
, sometimes a CodeInstance is not available, so there would be no access to the corresponding MethodInstance to look up those roots there. So the MethodInstances that show up in code
really do need to be in the Method roots; but possibly could encode both the symbol and the MethodInstance just for the linetable, using the symbol as a fallback if the MethodInstance isn't available.
I've hit this again, we spend almost 10% of the time in some workloads on the test suite just doing jl_encode_as_indexed_root
. I wonder if we should temporarily disable/revert the change until we can make not have such a big slowdown/size penalty
Ugh, yeah that might be a good idea - the part that would need to be reverted is minimal, the stacktrace lookup methods could be left in and ready.
But hopefully it can be restored in 1.11.
@BioTurboNick could I bother you to open a PR to temporarily revert and mark it as 1.10 backport?
I have been trying to debug a regression in the compilation time for many tests of Polymake.jl, it started with this PkgEval report https://s3.amazonaws.com/julialang-reports/nanosoldier/pkgeval/by_date/2023-04/27/report.html where the tests jumped from about 13 minutes to a timeout (>45min). The tests are mostly checking compatibility of our own vector and matrix types with the corresponding julia types (for different element types). These types are mostly defined via CxxWrap and a corresponding libcxxwrap module. In our own CI we see similar effects with nightly taking more than twice as long for the tests as 1.8 or 1.9.
I added a
@time
to several of the affected test groups and the output looks like this:So this is indeed compilation time and not related to the actual computations. That timing was for this code block.
I bisected the regression to #41099 by @BioTurboNick but I don't really know how to debug this further. In these timings below I noticed most of the extra time seems to be spent in
AST_COMPRESS
, which went from 4% to 51%.CODEGEN_LLVM
also went up a bit. But I don't really know what to make of this now, especially why or how our code is triggering this in combination with that PR.Timings
I ran some tests on a nightly build with
WITH_TIMING_COUNTS=1
. I added some extra verbosity to the test sets to see more timings and disabled several test sets which do not exhibit this problem.This is for the current master (Commit f407a4cac3):
And from another build where I reverted #41099 (and three other commits which I needed to revert to avoid conflicts, see at the end of this issue [1]):
The corresponding test timings for one test group which illustrates the problem quite well. This is for master:
And with the reverted commits:
Other packages
While looking at the PkgEval report mentioned earlier I noticed that MultivariatePolynomials.jl might have similar issues, at least for some of its tests:
before and after, from the second block with
MutableArithmetics
, the last one before the tests were killed.additional notes:
I also checked that just reverting the other three commits behaves similar to the current master without the reverts to make sure that #41099 is the key change.