TimLariviere / Fabulous-new

Fabulous v2 - Work in progress
https://timothelariviere.com/Fabulous-new/
Other
41 stars 3 forks source link

Optimize diffing redo #37

Closed twop closed 2 years ago

twop commented 2 years ago

Note depends on #36 fixes: https://github.com/TimLariviere/Fabulous-new/issues/17

Motivation

One of the goals of Fabulous is to be fast, in other words, produce minimal amount of overhead on top of underlying UI Framework (like XF or MAUI).

The way I see it we need to optimize 3 things:

  1. An API that allows to stop diffing sub trees, the attempt to cover that is #36
  2. A very efficient way to create Widget UI trees
  3. A very efficient way to calculate and apply diff of prev vs next Widget UI trees

This PR is mostly concerned with 2 and 3


Approach

  1. Allocate less
  2. Take advantage of Comp Expression needed for building children and diffing to be internal details of the framework.
  3. Use mutable data structures for things that are not observable to a user (such as diffing and CE) and use immutable optimized data structures that can be poked by a user.
  4. Utilize stack allocated collections and data structures whenever makes sense

What is done

Stack allocated collections

MutStackArray1

ArraySlice

DiffBuilder

StackArray3

StackList

Usage summary

Other changes

Benchmark

Finally the numbers!

Not only the diffing got significantly faster but now it allocates almost %50 less memory.

Note that the memory numbers below are taken with a different growth function. Now the growth rate of MutStackArray1 is 1.5 (note that it is less than ResizeArray), it is possible that we should be even more conservative and use 1.3 because most of our collections are small.

Summary

ProcessMessages benchmark

Before

depth: 10
time: 183.9 ms
memory: 222 MB

---

depth: 15
time: 3,332.6 ms 
memory: 2,472 MB

After

depth: 10
time: 146.4 ms (-37.5 ms)
memory: 127 MB (-96 MB)

---

depth: 15
time: 2,941.6 ms (-391 ms)
memory: 1,412 MB (-1060 MB)

On main branch

Method depth Mean Error StdDev Gen 0 Gen 1 Gen 2 Allocated
ProcessMessages 10 183.9 ms 3.53 ms 3.93 ms 68333.3333 16000.0000 3333.3333 222 MB
ProcessMessages 15 3,332.6 ms 50.03 ms 44.35 ms 759000.0000 178000.0000 78000.0000 2,472 MB

After these changes

Method depth Mean Error StdDev Gen 0 Gen 1 Gen 2 Allocated
ProcessMessages 10 146.4 ms 2.11 ms 1.97 ms 43250.0000 17000.0000 1500.0000 127 MB
ProcessMessages 15 2,941.6 ms 49.67 ms 46.46 ms 460000.0000 166000.0000 67000.0000 1,412 MB
twop commented 2 years ago

Used ValueOption instead of option, and used Span instead of ArraySlice when appropriate

here is the latest results (note slight increase in memory consumption but slightly better perf)

Method depth Mean Error StdDev Gen 0 Gen 1 Gen 2 Allocated
ProcessMessages 10 147.5 ms 1.66 ms 1.55 ms 42750.0000 16500.0000 1500.0000 138 MB
ProcessMessages 15 2,910.4 ms 36.78 ms 34.40 ms 476000.0000 158000.0000 61000.0000 1,536 MB
TimLariviere commented 2 years ago

here is the latest results (note slight increase in memory consumption but slightly better perf)

Not sure if it's really worth it. We have slightly worse perf and memory on low depth, and barely noticeable win on perf compared to what we will pay in GC pause for bigger depth.

What do you think?

twop commented 2 years ago

here is the latest results (note slight increase in memory consumption but slightly better perf)

Not sure if it's really worth it. We have slightly worse perf and memory on low depth, and barely noticeable win on perf compared to what we will pay in GC pause for bigger depth.

What do you think?

I think it is, even though it is more memory it is easier GC and faster runtime perf. Note that on M1 cache misses are less severe because of large cache and really wide CPU lines.

So I think it is better, I'm curious how it is going to be on mobile devices

twop commented 2 years ago

@TimLariviere I believe that I resolves/fixed all comments. Please take a look once more time. Happy to fix any other issues/concerns