Avoid Scala pattern matching in DAML-LF interpreter

hurryabit commented 4 years ago

A CEK machine heavily relies on pattern matching on AST node, continuation stack frames and other data. Unfortunately, pattern matching in Scala is particularly slow. Thus, we need to find alternative ways to discriminate the current AST node in control or the next continuation, ideally a mechanism akin to jump tables.

We should also investigate ways to avoid doing this discrimination whenever possible.

Improvements so far:

https://github.com/digital-asset/daml/pull/5814: 1.09x
https://github.com/digital-asset/daml/pull/5811: 1.13-1.21x

leo-da commented 4 years ago

In OOP exhaustive type matching/type casting is achievable with Double Dispatch (aka Visitor Design Pattern). Worth investigating? DD/Visitor is a lot of boilerplate though.

hurryabit commented 4 years ago

@leo-da We're already using some form of dynamic dispatch for the Kont type: https://github.com/digital-asset/daml/blob/067f3c987da11d6292e7cd51baa56f9d5c945528/daml-lf/interpreter/src/main/scala/com/digitalasset/daml/lf/speedy/Speedy.scala#L519 I'm not sure if that's exactly the visitor pattern but it's definitely the same basic idea. Unfortunately, that might still be too slow because of all the indirections through vtables. Ideally, we would find something based on jump tables like C's switch or Haskell's case ... of. But before we dive into the details, we need solid benchmarks.

leo-da commented 4 years ago

yep, that is double dispatch: instead of calling Machine's virtual method directly, Kont's execute called first to refine the type of SValue...

However some execute implementations are actually doing pattern matching on SValue... Is that the issue? In any case, you guys know what you are doing :).

nickchapman-da commented 4 years ago

We do triple dispatch: on Ctrl, on Kont and on Expr. I eliminated the dispatch on Ctrl (by making control always be an SExpr) for 17% speedup (CollectAuthority benchmark)

I think I know how to eliminate the dispatch on Kont in a similar way. Tomorrow...

nickchapman-da commented 4 years ago

To enable us to have only one kind of Kont we just have to have the effect of entering a continuation always be the same. In essence we will have all continuation be KPushTo

  final case class KPushTo(to: util.ArrayList[SValue], next: SExpr) extends Kont {
    def execute(v: SValue, machine: Machine) = {
      to.add(v)
      machine.ctrl = next
    }
  }

That is: they put the returned value somewhere, then install a new ctrl ready for the next step()

For the other continuations, we move the desired effect to a new variant of SExpr, which is used as the next when the Kont is created.

nickchapman-da commented 4 years ago

We had a discussion about the benefit of loosing the Ctrl trait and introduction of returnValue. I re-instated the original behaviour in two steps back on to the speedy-returnValue branch and got some numbers.

This perf run:

  bazel run daml-lf/scenario-interpreter:scenario-perf -- -f 4 -i 10 -wi 10

TL/DR: 111 --> 117 --> 121 ms/op

(1) The branch as it is, with no Ctrl

  110.767 ±(99.9%) 2.534 ms/op [Average]
  (min, avg, max) = (105.366, 110.767, 121.915), stdev = 4.504
  CI (99.9%): [108.233, 113.301] (assumes normal distribution)

(2) Reinstating the Ctrl trait, and using for the machine controle The core change is below (with fixups everywhere to the new type)

+  sealed trait Ctrl {
+    def execute(machine: Machine): Unit
+  }
+
+  final case class CtrlExpr(expr: SExpr) extends Ctrl {
+    def execute(machine: Machine) =
+      expr.execute(machine)
+  }
+
+  final case class CtrlValue(value: SValue) extends Ctrl {
+    def execute(machine: Machine): Unit = {
+      machine.popKont().execute(value, machine)
+    }
+  }
+
+

       /* The control is what the machine should be evaluating. */
-      var ctrl: SExpr,
+      var ctrl: Ctrl,

perf...

  116.695 ±(99.9%) 1.981 ms/op [Average]
  (min, avg, max) = (107.312, 116.695, 123.405), stdev = 3.521
  CI (99.9%): [114.714, 118.676] (assumes normal distribution)

(3) Further reinstating the machine behaviour to always step via a CtrlValue instead of directly passing the value to the continuation

In essense, this change:

     def returnValue(value: SValue): Unit = {
-      returnDepth += 1
-      if (kontStack.isEmpty) {
-        ctrl = CtrlValue(value) // The final resulting value
-      } else {
-        if (returnDepth < limitReturnDepth) {
-          val kont = popKont()
-          kont.execute(value, this)
-        } else {
-          ctrl = CtrlValue(value) // force the current step to end
-        }
-      }
+      ctrl = CtrlValue(value)
     }

perf...

  120.739 ±(99.9%) 1.714 ms/op [Average]
  (min, avg, max) = (115.624, 120.739, 126.339), stdev = 3.047
  CI (99.9%): [119.025, 122.453] (assumes normal distribution)

digital-asset / daml

Avoid Scala pattern matching in DAML-LF interpreter #5764