baseline: 2,834.4 ms
stream diff: 2,379.2 ms ( - 16%)
How I got there
The main idea behind this PR is that diffing can be done a streaming fashion without any extra allocations. But first let's talk about architecture.
Why decouple calculating a diff from applying it
Potentially it is "easy" to achieve max performance by just applying the diff immediately when we detect a change. Besides being not the most cleanest design it also lacks flexibility how the diff is going to be consumed.
There are several benefits to decoupling diffing from applying changes:
It is potentially relevant for tests (record changes instead of applying them)
Optimal order of applying changes. Example: update width and height scalars before updating children (to avoid wasted layouts)
You can pause/resume streaming, which is not relevant at the moment but it is a powerful ability.
If you heard about "Inversion of control" idea, that is exactly that.
How to avoid allocations
In F# if you use for value in collection do construct you can implement your own MyCustomIterator by simply having these:
collection should have a method GetEnumerator
It has returns an object that has two methods defined on it Current: 't and MoveNext: () -> bool.
Now equipped with that knowledge we need to implement custom enumerators for diffing scalars, widget attributes and widget collection attributes (WidgetDiff.fs)
How it works
Instead of calculating "One Big Diff" we create an object that you can pull to get changes, also that eliminates second traversal of the widget tree. So semantically it works exactly the same as you would call applyAttributeDiff immediately when you detect one, but conceptually they are still decoupled from each other, forming more flexible and less brittle architecture.
AND NO EXTRA ALLOCATIONS!
Misc
While looking at decompiled output I noticed that we sometimes create boxed calls via Tuple (heap allocated) for functions like these:
type IAttributeDefinition =
abstract member UpdateNode : (obj voption *IViewNode) -> unit // <--- boxed call!
Note that sometimes F# avoid these boxing and call functions directly, but it seems that this optimization is not consistent
Using the scientific approach "Try and see if it works" I found out that the fastest way is via curried function signature.
type IAttributeDefinition =
abstract member UpdateNode : obj voption -> IViewNode -> unit // <--- fastest
// also tried this
abstract member UpdateNode : struct (obj voption *IViewNode) -> unit
Note that struct (obj voption *IViewNode) -> unit is slower, it seems to be due to copying values into a tuple if they are stack allocated, and a lot of our types are (example Widget). Still not sure why that is the case though.
Fixes: https://github.com/TimLariviere/Fabulous-new/issues/35
Bench summary
Process message depth 15
Mem Usage
Time
How I got there
The main idea behind this PR is that diffing can be done a streaming fashion without any extra allocations. But first let's talk about architecture.
Why decouple calculating a diff from applying it
Potentially it is "easy" to achieve max performance by just applying the diff immediately when we detect a change. Besides being not the most cleanest design it also lacks flexibility how the diff is going to be consumed.
There are several benefits to decoupling diffing from applying changes:
width
andheight
scalars before updating children (to avoid wasted layouts)If you heard about "Inversion of control" idea, that is exactly that.
How to avoid allocations
In F# if you use
for value in collection do
construct you can implement your ownMyCustomIterator
by simply having these:collection
should have a methodGetEnumerator
Current: 't
andMoveNext: () -> bool
.You can find more about it here (Section: "Readonly and ref-like structs"): https://bartoszsypytkowski.com/writing-high-performance-f-code/
Now equipped with that knowledge we need to implement custom enumerators for diffing scalars, widget attributes and widget collection attributes (
WidgetDiff.fs
)How it works
Instead of calculating "One Big Diff" we create an object that you can pull to get changes, also that eliminates second traversal of the widget tree. So semantically it works exactly the same as you would call
applyAttributeDiff
immediately when you detect one, but conceptually they are still decoupled from each other, forming more flexible and less brittle architecture.AND NO EXTRA ALLOCATIONS!
Misc
While looking at decompiled output I noticed that we sometimes create boxed calls via Tuple (heap allocated) for functions like these:
Note that sometimes F# avoid these boxing and call functions directly, but it seems that this optimization is not consistent
Using the scientific approach "Try and see if it works" I found out that the fastest way is via curried function signature.
Note that
struct (obj voption *IViewNode) -> unit
is slower, it seems to be due to copying values into a tuple if they are stack allocated, and a lot of our types are (exampleWidget
). Still not sure why that is the case though.Detailed bench results
Baseline
Full diff streaming