limira commented 5 years ago

Hmm... sorry for opening this issue here. It's not about Dominator, just bring up by Dominator.

I saw Dominator's benchmark on Rust+Wasm news.

wasm-bindgen slower in some benchmarks

There are some benchmarks where Dominator v0.5 slower than v0.4.4:

...	v0.5	v0.4.4	v0.5 slowdown
partial update	544	418	>20%
select row	82.4	76.9	>5%
swap	108	90.6	~20%
remove row	56.9	51.8	~10%

What do you think about these? Are there problems with wasm-bindgen?

Creating 10,000 vs 1,000 rows:

...	vanillajs	inferno	wasm bindgen	preact	dominator	react	stdweb	yew
1,000 rows (1)	110	130	200	155	201	171	228	314
10,000 rows (2)	1177	1373	2024	1864	2630	1981	2378	3237
ratio = (2)/(1)	~10.5	~10.5	~10	~12	~13	~11	~10.5	~10.3

The ratio of Dominator is not good, but the ratio of Yew is perfectly normal?

Pauan commented 5 years ago

I wouldn't worry about that, there are some natural fluctuations when running the benchmarks, so even if the benchmark hasn't changed at all the numbers will go slightly up and down.

For that same reason, I also wouldn't worry about minor differences.

As for the ratios, dominator takes 0.201ms per row for 1,000 rows, and 0.263ms per row for 10,000 rows. That sounds like a pretty great ratio to me.

Meanwhile, Yew takes 0.314ms per row for 1,000 rows, and 0.3237ms per row for 10,000 rows.

limira commented 5 years ago

even if the benchmark hasn't changed at all the numbers will go slightly up and down

Yes I know that. How many times did you run the benchmark? In other words, what is the value of N in the command below?

npm run selenium -- --count N

If N>=10, I don't think the slowdown ~20% is slightly! Especially for the partial update:

...	v0.5	v0.4.4
partial update	544	418

Actually, the slowdown is 126/418 = 30%.

Meanwhile, Yew takes 0.314ms per row for 1,000 rows, and 0.3237ms per row for 10,000 rows.

I know that Yew is not great in absolute numbers! But the its ratio is normal, it is consistent. Creating 10,000 rows vs 1,000 rows just different in the count, we call the same function with different arg's value! (That said, I will not choose Yew in this case just because it is more consistent :grin: )

It's worth to note that the ratio of (2)/(1) of Dominator v0.4.4 is also ~13. And more, both Dominator v0.4.4 and Yew bases on stdweb!

Pauan commented 5 years ago

I used --count 10, but even then there's a lot of variance. There's likely a lot of reasons for that: caching, GC, the JIT, the browser's internal systems, etc. As soon as you step outside of the Rust world, performance becomes unpredictable. For example, most of the time in the dominator benchmark is spent in the JS GC, not in Rust.

As for the ratios, I'm not worried about that, they're all low, and they're all similar to each other. Like I said, minor differences aren't worth worrying about. That's why performance is often described in big-O notation.

If you find a specific area (such as a data structure or algorithm) where performance can be improved, I'd be very interested in hearing about that (I care an awful lot about performance). But I'm not going to fret over minor fluctuations in a synthetic benchmark, especially since all the frameworks have radically different implementation strategies and optimizations and tuning.

For example, the Yew benchmark doesn't use keys, so it doesn't need to keep track of the DOM nodes. That makes it faster, but it causes various problems.

Meanwhile, dominator has to keep track of the state of each individual DOM node (such as the Signals which that DOM node is using), and that also means it has to spawn 20,000 Tasks (which involves heap allocation and waiting for the next microtask queue). That adds a small amount of unavoidable overhead.

Pauan commented 5 years ago

To be clear, while I was creating the benchmark, I went over dominator with a fine-toothed-comb looking for any areas where I could improve its performance, and I made a lot of improvements to dominator's performance while doing so. And I used the browser's profiling tools extensively to pinpoint areas which needed to be improved (such as UTF-8 decoding and the JS GC), and then I fixed them.

So at this point dominator is quite close to optimal in terms of performance. Any performance improvements will either require radical data structure/algorithm changes, or improvements to the underlying system (wasm-bindgen, WebIDL, the DOM, the Futures Executor, etc.)

limira commented 5 years ago

If you find a specific area (such as a data structure or algorithm) where performance can be improved, I'd be very interested in hearing about that (I care an awful lot about performance).

How about introducing swap for futures-signals?

// I guess this cause a shift of hundred of rows by one position?
rows.move_from_to(1, 998); 
// Another shift in other direction?
rows.move_from_to(998 - 1, 1);

Introducing swap will greatly improve this situation, it is rarely used but very useful when we need it. Is it possible for futures-signals to do that?

But I'm not going to fret over minor fluctuations in a synthetic benchmark, especially since all the frameworks have radically different implementation strategies and optimizations and tuning.

So, put Yew aside.

Just consider the one framework - Dominator v0.5 vs v0.4.4 - just different in based library, the benchmark implemented by the same person. There is some improvements from v0.4.4 to v0.5, which is good. And:

So at this point dominator is quite close to optimal in terms of performance

I think that it only applies to Dominator v0.5. Another fact is that wasm-bindgen 0.2.47 is the same or better than stdweb 0.4.17 in every benchmarks. But there is still some minor fluctuations that cause it to slowdown by 30% is a weird fact.

Are you interested in re-run the benchmark to see if the situation stay the same or there will be some fluctuations that cause v0.5 win against v0.4.4 (especially on partial update)?

Pauan commented 5 years ago

Introducing swap will greatly improve this situation, it is rarely used but very useful when we need it. Is it possible for futures-signals to do that?

Adding in a swap method wouldn't improve the performance, because the DOM does not have a "swap two DOM nodes" API.

Moving things around in Rust is very fast, it's the DOM that's the problem. The DOM is many thousands of times slower than Rust. So making Rust a couple nanoseconds faster won't really help.

The move_from_to method is optimized so it will just perform a parent.removeChild(child) followed by parent.insertBefore(child, parent.childNodes[index])

All of the other benchmarks (including wasm-bindgen and stdweb) use insertBefore, because it's literally the only DOM API which is available.

However, now that you mention it, I think I can optimize move_from_to so it doesn't call the removeChild method. Then the DOM performance of dominator will be identical to the performance of the other frameworks. So thanks for pointing that out!

But there is still some minor fluctuations that cause it to slowdown by 30% is a weird fact.

No, it isn't weird. That's completely normal. As I have said, benchmarks naturally fluctuate, even when you run the exact same benchmark multiple times.

Since you don't seem to believe me, I recommend running some web app benchmarks yourself, on your own computer, so you can see.

Are you interested in re-run the benchmark to see if the situation stay the same or there will be some fluctuations that cause v0.5 win against v0.4.4 (especially on partial update)?

I had spent several hours re-running the benchmark dozens of times (with --count 10 every time), and I have seen the fluctuations. I've seen situations where dominator was the fastest for swapping rows (even faster than vanillajs).

It's not like I ran it one time and said "okay that's it, perfect, let's just post that one". Every time I made a tiny tweak to dominator to improve the performance I re-ran the benchmarks. I also re-ran the benchmarks multiple times even when I hadn't changed anything. I also re-ran the benchmarks for the other frameworks too.

Even when the dominator benchmark was officially accepted they got very different results. Notice how it's faster for everything except the "append 10,000 rows" benchmark.

There's so many things that affect the benchmark performance: your OS, your browser, any background programs that are running, GC, JIT, internal browser data structures, the multiple layers of CPU caches, the multiple layers of RAM caches, the multiple layers of filesystem caches, the multiple layers of browser caches, the application caches, etc.

That's why I keep saying that it isn't a big deal. That's why I keep saying that minor fluctuations should be completely ignored, because they are useless noise, they don't actually indicate a performance problem. I've benchmarked many JS programs over the years, so I'm speaking from experience.

limira commented 5 years ago

Adding in a swap method wouldn't improve the performance

So, I guess adding it (not for performance) just because:

std::vec::Vec has it
we can reduce two lines of move_from_to into a single line of swap.
don't have to think about 998 - 1 in rows.move_from_to(998 - 1, 1);

Since you don't seem to believe me

It's not about believe or not believe in the person I am talking to. Discussions are for understanding facts (we may find out some truths - or something similar :grin: - behind them).

Notice how it's faster for everything except the "append 10,000 rows" benchmark.

I think you want to say about create many rows (creating 10,000 row). Creating 1000 rows is faster, but creating 10,000 rows is slower. The ratio change from 13 to 17... I don't know what to say... is it a big noise?.

Any way, I will close this because it does not lead to anything useful. I am sorry for the annoyance this bring to you. Thank you for all the patient answers you gave.

Pauan commented 5 years ago

I agree that adding a swap method as a convenience would be nice. That's not hard to do.

Pauan / rust-dominator

About the benchmark #15

wasm-bindgen slower in some benchmarks

Creating 10,000 vs 1,000 rows: