Open scheglov opened 3 years ago
@scheglov can you provide us the command line to reproduce this situation. //cc @alexmarkov
You will need https://dart-review.googlesource.com/c/sdk/+/220070
I run it with time dart /Users/scheglov/Source/Dart/sdk.git/sdk/pkg/analysis_server/benchmark/benchmarks.dart run das-flutter --flutter-repository=/Users/scheglov/Source/flutter
.
FYI
scheglov@scheglov-macbookpro2:~/Source/Dart/sdk.git/sdk (ss-completion-benchmark-output)$ dart --version
Dart SDK version: 2.16.0-edge.b34f39985971d099994e93012cb3b2319712e6b6 (be) (Wed Nov 10 20:38:34 2021 +0000) on "macos_x64"
In this code whereType<ExtensionElement>().toList()
is not fully inlined, so WhereTypeIterator.moveNext
ends up testing _source.current is T
for an unknown type argument T
. This kind of type test currently involves subtype test cache and a slow call into runtime. I'm not sure why cache doesn't work in this case, but it is possible that it simply overflows as cache is shared among all invocations of WhereTypeIterator.moveNext
and all whereType
with different types.
Manual loop performs element is ExtensionElement
for a known class ExtensionElement
which is generated much more efficiently.
I tried passing --max_subtype_cache_entries=1000
VM option to the analysis server, and it eliminates DRT_InstanceOf from profile. This means that previously runtime was called due to an overflow of subtype test cache.
I see the following ways of improving performance in this case:
whereType<Foo>().toList()
is fully inlined and type test is T
is optimized if type argument T
becomes known after inlining. Note that it may increase code size if we always inline all involved methods.is T
type tests by generating specialized type testing stubs for is
tests (currently they are used only for as
tests and parameter type checks).is T
type tests by using a more smart caching strategy: we could use a hash table for large caches to avoid linear lookup and make cache size unlimited.@mraleph @mkustermann @sstrickl WDYT?
To maybe rephrase @alexmarkov 's suggestions (and add one) for improving general is
checks:
is
checks, we could evaluate the code size cost of that on some of our apps (and the analyzer in this case)I expect that https://dart-review.googlesource.com/c/sdk/+/308941 and its recent predecessors have probably improved this situation, since STCs are now larger and also are hash-based at most sizes for quick lookup, though a TTS-like solution to is
checks to avoid hitting the STC in the first place (versus the currently generated inline checks, which are limited due to being per-call site) would probably finish it off if we find in the future that InstanceOf
is still a chokepoint.
Thanks for the update, @sstrickl !
The one issue we have with <obj> is <type>
checks is that - compared to as
checks, the STCs will fill up with positive & negative answers. So there can be many different <obj>
classes flowing into the check, making the caches very large.
A concrete example that makes the CFE slow (/cc @jensjoha @johnniwinther ) is a stack implementation that's on the hot path (see here):
Object? pop(NullValue? nullValue) {
final Object? value = array[--arrayLength];
...
if (value is! NullValue) {
return value;
}
...
}
There's many different kinds of objects being pop()
ed from the stack (subtypes of NullValue
may even be the rare case). So the STC caches would fill up considerably.
Though for this case (and other similar situations) we could do an exhaustive inline cid-range check for the NullValue
class - as there's only two types implementing it (the NullValue
class itself and enum NullValues implements NullValue<Object>
). Currently this is NullValue
requires 2 cid-ranges (due to implements
) which makes our compiler not optimize it atm - but we can change the compiler to emit exhaustive cid-range checks even if we have more than 1 cid range.
It actually should do an exhaustive cid-range check for the NullValue
class, if I'm reading the block at https://github.com/dart-lang/sdk/blob/85de9c1d7ee818c75d165877ce1a97b0cddca71f/runtime/vm/compiler/backend/flow_graph_compiler.cc#L2984 correctly, unless the number of cid ranges are greater than kMaxNumberOfCidRangesToTest
(currently, 4
).
(The fact that we saw it hit the STC, since we needed 03048322028cde44d70a34b6a0a70c160e88fa7b to reduce the overhead until stubs handled hash-based caches, makes it clear that it doesn't and now I'm curious as to why as I'd only expect 2 cid ranges at most here unless I'm forgetting something... maybe there is an additional range for the bottom types as well, but those should be consecutive and thus a single additional range, making it 3 ranges.)
It actually should do an exhaustive cid-range check for the NullValue class, if I'm reading the block at
I think the code you refer to is for FlowGraphCompiler::GenerateCallerChecksForAssertAssignable
aka as
-checks but the CFE example is using is
checks against NullValue
... right. Yes. Of course. 🤦♀️
Interestingly, I observe this only when I'm running benchmarks, i.e. run Dart from Dart and drive Dart Analysis Server with requests. Maybe too intensive, and VM does not get time to do some background optimization work?
With
var extensions = exportElements.whereType<ExtensionElement>().toList();
With