Enumerator for diffing - Githubissues

Fixes: https://github.com/TimLariviere/Fabulous-new/issues/35

Bench summary

Process message depth 15

Mem Usage

baseline: 1,535 MB
stream diff: 810 MB ( -47.2% !!!!)

Time

baseline: 2,834.4 ms
stream diff: 2,379.2 ms ( - 16%)

How I got there

The main idea behind this PR is that diffing can be done a streaming fashion without any extra allocations. But first let's talk about architecture.

Why decouple calculating a diff from applying it

Potentially it is "easy" to achieve max performance by just applying the diff immediately when we detect a change. Besides being not the most cleanest design it also lacks flexibility how the diff is going to be consumed.

There are several benefits to decoupling diffing from applying changes:

It is potentially relevant for tests (record changes instead of applying them)
Optimal order of applying changes. Example: update width and height scalars before updating children (to avoid wasted layouts)
You can pause/resume streaming, which is not relevant at the moment but it is a powerful ability.

If you heard about "Inversion of control" idea, that is exactly that.

How to avoid allocations

In F# if you use for value in collection do construct you can implement your own MyCustomIterator by simply having these:

collection should have a method GetEnumerator
It has returns an object that has two methods defined on it Current: 't and MoveNext: () -> bool.

You can find more about it here (Section: "Readonly and ref-like structs"): https://bartoszsypytkowski.com/writing-high-performance-f-code/

Now equipped with that knowledge we need to implement custom enumerators for diffing scalars, widget attributes and widget collection attributes (WidgetDiff.fs)

How it works

Instead of calculating "One Big Diff" we create an object that you can pull to get changes, also that eliminates second traversal of the widget tree. So semantically it works exactly the same as you would call applyAttributeDiff immediately when you detect one, but conceptually they are still decoupled from each other, forming more flexible and less brittle architecture.

AND NO EXTRA ALLOCATIONS!

Misc

While looking at decompiled output I noticed that we sometimes create boxed calls via Tuple (heap allocated) for functions like these:

type IAttributeDefinition =
    abstract member UpdateNode : (obj voption *IViewNode) -> unit // <--- boxed call!

Note that sometimes F# avoid these boxing and call functions directly, but it seems that this optimization is not consistent

Using the scientific approach "Try and see if it works" I found out that the fastest way is via curried function signature.

type IAttributeDefinition =
    abstract member UpdateNode : obj voption  -> IViewNode -> unit // <--- fastest

   // also tried this
   abstract member UpdateNode : struct (obj voption *IViewNode) -> unit

Note that struct (obj voption *IViewNode) -> unit is slower, it seems to be due to copying values into a tuple if they are stack allocated, and a lot of our types are (example Widget). Still not sure why that is the case though.

Detailed bench results

Baseline

Method	depth	Mean	Error	StdDev	Gen 0	Gen 1	Gen 2	Allocated
ProcessMessages	10	148.3 ms	2.20 ms	2.06 ms	44750.0000	17250.0000	2000.0000	138 MB
ProcessMessages	15	2,834.4 ms	54.43 ms	58.24 ms	486000.0000	177000.0000	77000.0000	1,535 MB

Full diff streaming

Method	depth	Mean	Error	StdDev	Gen 0	Gen 1	Gen 2	Allocated
ProcessMessages	10	143.1 ms	2.75 ms	2.70 ms	23000.0000	6750.0000	1000.0000	73 MB
ProcessMessages	15	2,379.2 ms	32.50 ms	28.81 ms	298000.0000	113000.0000	51000.0000	810 MB

TimLariviere / Fabulous-new

Enumerator for diffing #51