TimLariviere / Fabulous-new

Fabulous v2 - Work in progress
https://timothelariviere.com/Fabulous-new/
Other
41 stars 3 forks source link

Enumerator for diffing #51

Closed twop closed 2 years ago

twop commented 2 years ago

Fixes: https://github.com/TimLariviere/Fabulous-new/issues/35

Bench summary

Process message depth 15

Mem Usage

baseline: 1,535 MB
stream diff: 810 MB ( -47.2% !!!!)

Time

baseline: 2,834.4 ms
stream diff: 2,379.2 ms ( - 16%)

How I got there

The main idea behind this PR is that diffing can be done a streaming fashion without any extra allocations. But first let's talk about architecture.

Why decouple calculating a diff from applying it

Potentially it is "easy" to achieve max performance by just applying the diff immediately when we detect a change. Besides being not the most cleanest design it also lacks flexibility how the diff is going to be consumed.

There are several benefits to decoupling diffing from applying changes:

  1. It is potentially relevant for tests (record changes instead of applying them)
  2. Optimal order of applying changes. Example: update width and height scalars before updating children (to avoid wasted layouts)
  3. You can pause/resume streaming, which is not relevant at the moment but it is a powerful ability.

If you heard about "Inversion of control" idea, that is exactly that.

How to avoid allocations

In F# if you use for value in collection do construct you can implement your own MyCustomIterator by simply having these:

  1. collection should have a method GetEnumerator
  2. It has returns an object that has two methods defined on it Current: 't and MoveNext: () -> bool.

You can find more about it here (Section: "Readonly and ref-like structs"): https://bartoszsypytkowski.com/writing-high-performance-f-code/

Now equipped with that knowledge we need to implement custom enumerators for diffing scalars, widget attributes and widget collection attributes (WidgetDiff.fs)

How it works

Instead of calculating "One Big Diff" we create an object that you can pull to get changes, also that eliminates second traversal of the widget tree. So semantically it works exactly the same as you would call applyAttributeDiff immediately when you detect one, but conceptually they are still decoupled from each other, forming more flexible and less brittle architecture.

AND NO EXTRA ALLOCATIONS!

Misc

While looking at decompiled output I noticed that we sometimes create boxed calls via Tuple (heap allocated) for functions like these:

type IAttributeDefinition =
    abstract member UpdateNode : (obj voption *IViewNode) -> unit // <--- boxed call!

Note that sometimes F# avoid these boxing and call functions directly, but it seems that this optimization is not consistent

Using the scientific approach "Try and see if it works" I found out that the fastest way is via curried function signature.

type IAttributeDefinition =
    abstract member UpdateNode : obj voption  -> IViewNode -> unit // <--- fastest

   // also tried this
   abstract member UpdateNode : struct (obj voption *IViewNode) -> unit

Note that struct (obj voption *IViewNode) -> unit is slower, it seems to be due to copying values into a tuple if they are stack allocated, and a lot of our types are (example Widget). Still not sure why that is the case though.


Detailed bench results

Baseline

Method depth Mean Error StdDev Gen 0 Gen 1 Gen 2 Allocated
ProcessMessages 10 148.3 ms 2.20 ms 2.06 ms 44750.0000 17250.0000 2000.0000 138 MB
ProcessMessages 15 2,834.4 ms 54.43 ms 58.24 ms 486000.0000 177000.0000 77000.0000 1,535 MB

Full diff streaming

Method depth Mean Error StdDev Gen 0 Gen 1 Gen 2 Allocated
ProcessMessages 10 143.1 ms 2.75 ms 2.70 ms 23000.0000 6750.0000 1000.0000 73 MB
ProcessMessages 15 2,379.2 ms 32.50 ms 28.81 ms 298000.0000 113000.0000 51000.0000 810 MB