VM: covariant check slower than is-check

rakudrama commented 6 years ago

afaca9f7cfdf0a49bcba8985afec59e97e3019ba speeds up a hot map by using a field hidden behind a Map interface.

The implementation of the map indexer is:

  ClassHierarchyNode operator [](Object cls) {
    if (cls is ClassHierarchyNodesMapKey) {
      return cls._classHierarchyNode;
    }
    throw new UnimplementedError('ClassHierarchyNodesMap for $cls');
  }

I'd rather use the simpler formulation:

  ClassHierarchyNode operator [](covariant ClassHierarchyNodesMapKey cls) {
    return cls._classHierarchyNode;
  }

However, the performance of the 'better' covariant form is bad enough to negate the benefit of using a field.

ClassHierarchyNode operator [](Object cls) checks for the two ClassID for the concrete classes that have the field, and then loads field:

0x7ff763ab50a0            movq r13,[r12+0x17]    
0x7ff763ab50a5            push rbp    
0x7ff763ab50a6            movq rbp,rsp    
0x7ff763ab50a9            push r12    
0x7ff763ab50ab            push pp    
0x7ff763ab50ad            movq pp,r13    
0x7ff763ab50b0            movq rcx,[rbp+0x10]    
0x7ff763ab50b4            testb rcx,1    
0x7ff763ab50b7            jz 0x00007ff763ab50e4    
0x7ff763ab50bd            movzxwq rdx,[rcx+0x1]    
0x7ff763ab50c2            cmpl rdx,0x40a    
0x7ff763ab50c8            jz 0x00007ff763ab50d6    
0x7ff763ab50ca            cmpl rdx,0x40c    
0x7ff763ab50d0            jnz 0x00007ff763ab50e4    
0x7ff763ab50d6            movq rax,[rcx+0xf]    
0x7ff763ab50da            movq pp,[rbp-0x10]    
0x7ff763ab50de            movq rsp,rbp    
0x7ff763ab50e1            pop rbp    
0x7ff763ab50e2            ret
0x7ff763ab50e4 <deopt>

ClassHierarchyNode operator [](covariant ClassHierarchyNodesMapKey cls) also tests for the two ClassIDs, but then does a lot more work, including calling Subtype1TestCache (according to profiles). All this work is unnecessary, since both tested ClassIDs implement the declared covariant type.

0x7fe6e58d71c0            movq r13,[r12+0x17]    
0x7fe6e58d71c5            push rbp    
0x7fe6e58d71c6            movq rbp,rsp    
0x7fe6e58d71c9            push r12    
0x7fe6e58d71cb            push pp    
0x7fe6e58d71cd            movq pp,r13    
0x7fe6e58d71d0            movq rbx,[rbp+0x10]    
0x7fe6e58d71d4            testb rbx,1    
0x7fe6e58d71d7            jz 0x00007fe6e58d72c2    
0x7fe6e58d71dd            movzxwq rax,[rbx+0x1]    
0x7fe6e58d71e2            cmpl rax,0x40a    
0x7fe6e58d71e7            jz 0x00007fe6e58d71f4    
0x7fe6e58d71e9            cmpl rax,0x40c    
0x7fe6e58d71ee            jnz 0x00007fe6e58d72c2    
0x7fe6e58d71f4            movq rax,rbx    
0x7fe6e58d71f7            movq rdx,[thr+0x68]    null
0x7fe6e58d71fb            movq rcx,[thr+0x68]    null
0x7fe6e58d71ff            cmpq rax,[thr+0x68]    null
0x7fe6e58d7203            jz 0x00007fe6e58d72b0    
0x7fe6e58d7209            test al,1    
0x7fe6e58d720b            jz 0x00007fe6e58d726a    
0x7fe6e58d7211            movzxwq r10,[rax+0x1]    
0x7fe6e58d7216            movzxwq r11,[rax+0x1]    
0x7fe6e58d721b            movq r10,[thr+0x18]    
0x7fe6e58d721f            movq r10,[r10+0x50]    
0x7fe6e58d7223            addq r11,r11    
0x7fe6e58d7226            movq r10,[r10+r11*8]    
0x7fe6e58d722a            movq r13,[r10+0x57]    
0x7fe6e58d722e            movq r13,[r13+0xf]    
0x7fe6e58d7232            cmpq r13,0x874    
0x7fe6e58d7239            jz 0x00007fe6e58d72b0    
0x7fe6e58d723f            movq r9,[pp+0xf]    SubtypeTestCache
0x7fe6e58d7246            movq r12,[pp+0x17]    [Stub] Subtype1TestCache
0x7fe6e58d724d            movq r11,[r12+0x7]    
0x7fe6e58d7252            call r11    
0x7fe6e58d7255            cmpq r8,[thr+0x68]    null
0x7fe6e58d7259            jz 0x00007fe6e58d726a    
0x7fe6e58d725b            cmpq r8,[thr+0x70]    true
0x7fe6e58d725f            jz 0x00007fe6e58d72b0    
0x7fe6e58d7265            jmp 0x00007fe6e58d726a    
0x7fe6e58d726a            push [thr+0x68]    null
0x7fe6e58d726e            push rax    
0x7fe6e58d726f            movq r11,[pp+0x1f]    ClassHierarchyNodesMapKey
0x7fe6e58d7276            push r11    
0x7fe6e58d7278            push rdx    
0x7fe6e58d7279            push rcx    
0x7fe6e58d727a            movq r11,[pp+0x27]    'cls'
0x7fe6e58d7281            push r11    
0x7fe6e58d7283            movq rax,[pp+0x2f]    SubtypeTestCache
0x7fe6e58d728a            push rax    
0x7fe6e58d728b            push 4    
0x7fe6e58d728d            movq rbx,[thr+0x200]    
0x7fe6e58d7294            movl r10,7    
0x7fe6e58d729a            movq r11,[thr+0x108]    
0x7fe6e58d72a1            movq r12,[thr+0xa8]    [Stub] CallToRuntime
0x7fe6e58d72a8            call r11    
0x7fe6e58d72ab            addq rsp,0x38    
0x7fe6e58d72af            pop rax    
0x7fe6e58d72b0            movq rcx,[rbp+0x10]    
0x7fe6e58d72b4            movq rax,[rcx+0xf]    
0x7fe6e58d72b8            movq pp,[rbp-0x10]    
0x7fe6e58d72bc            movq rsp,rbp    
0x7fe6e58d72bf            pop rbp    
0x7fe6e58d72c0            ret    
0x7fe6e58d72c2 <deopt>

/cc @mraleph @aartbik

mraleph commented 6 years ago

Explicit is checks and implicit casts are presented in two different ways in IR. is checks do type profiling and are specialized based on that. Implicit casts don't do type profiling.

0x7fe6e58d71e2            cmpl rax,0x40a    
0x7fe6e58d71e7            jz 0x00007fe6e58d71f4    
0x7fe6e58d71e9            cmpl rax,0x40c    
0x7fe6e58d71ee            jnz 0x00007fe6e58d72c2

This chunk of code comes from ClassCheck that we insert above AssertAssignable (this is done by FlowGraphTypePropagator::StrengthenAsserts).

The expectation is that AssertAssignable would be then eliminated by type propagation - however our type propagation lattice is not smart enough to represent the type we need here.

[x] Investigate if type propagation lattice can be extended to make StrengthenAsserts work as expected for polymorphic class checks.
[ ] Investigate whether it makes sense to use runtime type profiling in implicit casts similar to explicit is and as.
[ ] Investigate if it makes sense to use type testing stubs code for non-eliminatated assert assignables - type testing stubs have much better fast path (use CID ranges and not subtype test cache).