dart-lang / sdk

The Dart SDK, including the VM, JS and Wasm compilers, analysis, core libraries, and more.
https://dart.dev
BSD 3-Clause "New" or "Revised" License
10.24k stars 1.58k forks source link

VM: covariant check slower than is-check #33755

Open rakudrama opened 6 years ago

rakudrama commented 6 years ago

afaca9f7cfdf0a49bcba8985afec59e97e3019ba speeds up a hot map by using a field hidden behind a Map interface.

The implementation of the map indexer is:

  ClassHierarchyNode operator [](Object cls) {
    if (cls is ClassHierarchyNodesMapKey) {
      return cls._classHierarchyNode;
    }
    throw new UnimplementedError('ClassHierarchyNodesMap for $cls');
  }

I'd rather use the simpler formulation:

  ClassHierarchyNode operator [](covariant ClassHierarchyNodesMapKey cls) {
    return cls._classHierarchyNode;
  }

However, the performance of the 'better' covariant form is bad enough to negate the benefit of using a field.

ClassHierarchyNode operator [](Object cls) checks for the two ClassID for the concrete classes that have the field, and then loads field:

0x7ff763ab50a0            movq r13,[r12+0x17]    
0x7ff763ab50a5            push rbp    
0x7ff763ab50a6            movq rbp,rsp    
0x7ff763ab50a9            push r12    
0x7ff763ab50ab            push pp    
0x7ff763ab50ad            movq pp,r13    
0x7ff763ab50b0            movq rcx,[rbp+0x10]    
0x7ff763ab50b4            testb rcx,1    
0x7ff763ab50b7            jz 0x00007ff763ab50e4    
0x7ff763ab50bd            movzxwq rdx,[rcx+0x1]    
0x7ff763ab50c2            cmpl rdx,0x40a    
0x7ff763ab50c8            jz 0x00007ff763ab50d6    
0x7ff763ab50ca            cmpl rdx,0x40c    
0x7ff763ab50d0            jnz 0x00007ff763ab50e4    
0x7ff763ab50d6            movq rax,[rcx+0xf]    
0x7ff763ab50da            movq pp,[rbp-0x10]    
0x7ff763ab50de            movq rsp,rbp    
0x7ff763ab50e1            pop rbp    
0x7ff763ab50e2            ret
0x7ff763ab50e4 <deopt>

ClassHierarchyNode operator [](covariant ClassHierarchyNodesMapKey cls) also tests for the two ClassIDs, but then does a lot more work, including calling Subtype1TestCache (according to profiles). All this work is unnecessary, since both tested ClassIDs implement the declared covariant type.

0x7fe6e58d71c0            movq r13,[r12+0x17]    
0x7fe6e58d71c5            push rbp    
0x7fe6e58d71c6            movq rbp,rsp    
0x7fe6e58d71c9            push r12    
0x7fe6e58d71cb            push pp    
0x7fe6e58d71cd            movq pp,r13    
0x7fe6e58d71d0            movq rbx,[rbp+0x10]    
0x7fe6e58d71d4            testb rbx,1    
0x7fe6e58d71d7            jz 0x00007fe6e58d72c2    
0x7fe6e58d71dd            movzxwq rax,[rbx+0x1]    
0x7fe6e58d71e2            cmpl rax,0x40a    
0x7fe6e58d71e7            jz 0x00007fe6e58d71f4    
0x7fe6e58d71e9            cmpl rax,0x40c    
0x7fe6e58d71ee            jnz 0x00007fe6e58d72c2    
0x7fe6e58d71f4            movq rax,rbx    
0x7fe6e58d71f7            movq rdx,[thr+0x68]    null
0x7fe6e58d71fb            movq rcx,[thr+0x68]    null
0x7fe6e58d71ff            cmpq rax,[thr+0x68]    null
0x7fe6e58d7203            jz 0x00007fe6e58d72b0    
0x7fe6e58d7209            test al,1    
0x7fe6e58d720b            jz 0x00007fe6e58d726a    
0x7fe6e58d7211            movzxwq r10,[rax+0x1]    
0x7fe6e58d7216            movzxwq r11,[rax+0x1]    
0x7fe6e58d721b            movq r10,[thr+0x18]    
0x7fe6e58d721f            movq r10,[r10+0x50]    
0x7fe6e58d7223            addq r11,r11    
0x7fe6e58d7226            movq r10,[r10+r11*8]    
0x7fe6e58d722a            movq r13,[r10+0x57]    
0x7fe6e58d722e            movq r13,[r13+0xf]    
0x7fe6e58d7232            cmpq r13,0x874    
0x7fe6e58d7239            jz 0x00007fe6e58d72b0    
0x7fe6e58d723f            movq r9,[pp+0xf]    SubtypeTestCache
0x7fe6e58d7246            movq r12,[pp+0x17]    [Stub] Subtype1TestCache
0x7fe6e58d724d            movq r11,[r12+0x7]    
0x7fe6e58d7252            call r11    
0x7fe6e58d7255            cmpq r8,[thr+0x68]    null
0x7fe6e58d7259            jz 0x00007fe6e58d726a    
0x7fe6e58d725b            cmpq r8,[thr+0x70]    true
0x7fe6e58d725f            jz 0x00007fe6e58d72b0    
0x7fe6e58d7265            jmp 0x00007fe6e58d726a    
0x7fe6e58d726a            push [thr+0x68]    null
0x7fe6e58d726e            push rax    
0x7fe6e58d726f            movq r11,[pp+0x1f]    ClassHierarchyNodesMapKey
0x7fe6e58d7276            push r11    
0x7fe6e58d7278            push rdx    
0x7fe6e58d7279            push rcx    
0x7fe6e58d727a            movq r11,[pp+0x27]    'cls'
0x7fe6e58d7281            push r11    
0x7fe6e58d7283            movq rax,[pp+0x2f]    SubtypeTestCache
0x7fe6e58d728a            push rax    
0x7fe6e58d728b            push 4    
0x7fe6e58d728d            movq rbx,[thr+0x200]    
0x7fe6e58d7294            movl r10,7    
0x7fe6e58d729a            movq r11,[thr+0x108]    
0x7fe6e58d72a1            movq r12,[thr+0xa8]    [Stub] CallToRuntime
0x7fe6e58d72a8            call r11    
0x7fe6e58d72ab            addq rsp,0x38    
0x7fe6e58d72af            pop rax    
0x7fe6e58d72b0            movq rcx,[rbp+0x10]    
0x7fe6e58d72b4            movq rax,[rcx+0xf]    
0x7fe6e58d72b8            movq pp,[rbp-0x10]    
0x7fe6e58d72bc            movq rsp,rbp    
0x7fe6e58d72bf            pop rbp    
0x7fe6e58d72c0            ret    
0x7fe6e58d72c2 <deopt>

/cc @mraleph @aartbik

mraleph commented 6 years ago

Explicit is checks and implicit casts are presented in two different ways in IR. is checks do type profiling and are specialized based on that. Implicit casts don't do type profiling.

0x7fe6e58d71e2            cmpl rax,0x40a    
0x7fe6e58d71e7            jz 0x00007fe6e58d71f4    
0x7fe6e58d71e9            cmpl rax,0x40c    
0x7fe6e58d71ee            jnz 0x00007fe6e58d72c2    

This chunk of code comes from ClassCheck that we insert above AssertAssignable (this is done by FlowGraphTypePropagator::StrengthenAsserts).

The expectation is that AssertAssignable would be then eliminated by type propagation - however our type propagation lattice is not smart enough to represent the type we need here.

aartbik commented 6 years ago

I made a standalone Dart program that illustrates the issues reported in this bug. Hopefully helpful while analyzing and fixing this further.

Expected output:

Instance of 'B'
Instance of 'F'
passed

Program:

import 'dart:collection';

class A {}
class B {}
abstract class C implements A { B _b; }
class D extends C { }
class E extends C { }
class F extends B { }

class X extends MapBase<A, B> {

  // Pick this one:
//B operator [](Object c) {
//  if (c is C) {
//    return c._b;
//  }
//  throw new UnimplementedError('no [] C: $c');
//}

  // Or this one:
  B operator [](covariant C c) {
    return c._b;
  }

  operator []=(Object c, B b) {
    if (c is C) {
      c._b = b;
      return;
    }
    throw new UnimplementedError('no []= C: $c');
  }

  // Don't care.
  Iterable<A> get keys { return null; }
  B remove(Object key) { return null; }
  void clear() {}
}

main() {
  A a = new A();
  B b = new B();
  D d = new D();
  E e = new E();
  F f = new F();

  X x = new X();
  x[d] = b;
  x[e] = f;

  print("${x[d]}");
  print("${x[e]}");
  try {
    print("${x[a]}");
  } catch (e, s) {
    print("passed");
  }
}
aartbik commented 6 years ago

As observed above, under JIT, X vs. Y both do the check-class, but Y does not stop there:

  2: B3[target]:0 ParallelMove rcx <- S+2
  4:     CheckClass:12(v3 Cids[1: D etc.  cid 1026-1027])
  6:     v18 <- LoadField(v3, 8 {_b@17180520} [nullable dynamic]) T{Type: class 'B'?}
  7:     ParallelMove rax <- rax
  8:     Return:26(v18)

  2: B1[target]:0 ParallelMove rbx <- S+2
  4:     CheckClass:12(v3 Cids[1: D etc.  cid 1026-1027])
  5:     ParallelMove rax <- rbx, rdx <- C, rcx <- C
  6:     AssertAssignable:12(v3, Type: class 'C', 'c', instantiator_type_args(v0), function_type_args(v0)) T{Type: class 'C'?}
  7:     ParallelMove rcx <- S+2
  8:     v4 <- LoadField(v3 T{Type: class 'C'?}, 8 {_b@17180520} [nullable dynamic]) T{Type: class 'B'?}
  9:     ParallelMove rax <- rax
 10:     Return:18(v4)
aartbik commented 6 years ago

With a better lattice, type for v3 is improved, and the code for Y becomes the same as for X (generated CheckClass removes the AssetAssignable):

  2: B1[target]:0 ParallelMove rcx <- S+2
  4:     CheckClass:12(v3 Cids[1: D etc.  cid 1026-1027])
  6:     v4 <- LoadField(v3 T{Type: class 'C'}, 8 {_b@17180520} [nullable dynamic]) T{Type: class 'B'?}
  7:     ParallelMove rax <- rax
  8:     Return:18(v4)
aartbik commented 6 years ago

I made two improvements:

Overall this yields a 50-80% improvement on the co-variant benchmarks, bringing co-variant on par with explicit is-check (and both improved).

aartbik commented 6 years ago

Slava, are your points 2 and 3 still worth pursuing?

aartbik commented 6 years ago

armv8 improvements on benchmark (note that both improve, and both are "on par" afterwards):

TypeChecks.IsCheck (Odroid-C2)        22.61%
TypeChecks.CovariantCheck (Odroid-C2) 88.61%
mraleph commented 6 years ago

@aartbik I think it might be worth investigating - but probably in the larger context of unifying how assert-assignable-s and is checks work in general. Right now I'd put this on back burner unless we discover more cases where it matters.

(I am considering to keep this as a starter project for the next member of the VM team)

aartbik commented 5 years ago

Over to Slava in case he wants to re-assign this as starter.