make storage async, attempt 3.

dan-da commented 6 months ago

addresses #108, #75 obsoletes #120, #121

co-authored with @aszepieniec.

This PR builds on #120. @aszepieniec was able to remove the Mmr trait which resolves this headache and I added some further cleanups and fixes to get all tests passing afterwards.

Unlike #120, this PR does not spawn an OS thread for each method call in ArchivalMmr(or any method).

All tests are passing. however it is important to note that f3384fbe6f642de9ca4474b4e5b4ead37a4da36c contains a questionable fix to ArchivalMutatorSet::batch_remove() that should be carefully reviewed by @aszepieniec prior to merging this.. edit: fixed properly by @aszepieniec in d602858f3d77933119153d9dc0a60e512e52a099

I should also note that this PR requires much less code to become async than #121.

dependencies: this code depends on github builds of twenty-first and tasm-lib. The required changes are already in master of each crate, so once new releases are published we can use those instead.

Also, I have another PR in the pipeline that depends on this one.

dan-da commented 6 months ago

In some places we .await inside a loop, which sounds bad performance-wise especially if it resolves through a database IO. Could you comment on how that affects performance and what the idiomatic way to do things asyncly is?

Also worth noting that this PR gave rise to issues #126 and #127 which are about avoiding repeated IO calls (and other optimizations) in the course of archival MMR and mutator set logic.

dan-da commented 6 months ago

In some places we .await inside a loop, which sounds bad performance-wise especially if it resolves through a database IO. Could you comment on how that affects performance and what the idiomatic way to do things asyncly is?

The await just means "pause until this call completes, and meanwhile process other pending tasks". [1] Whereas in the old code with blocking storage layer, the same DB call without await meant "pause this entire OS thread until this call completes, including pausing the tokio executor and all other pending tasks".

So awaiting inside a loop is not bad per-se [2]. And indeed compared to the old code, we are much more concurrency-friendly.

For illustration, let's examine the same scenario with non-async DB calls vs async calls.

non-async: We get an RPC call that performs a lot of DB operations, say 1000 ops, and each op takes 10 ms. (just for example). Ok, so that's 100 seconds spent processing that RPC call, just for the DB ops. And the tokio thread executing that task is just sitting there twiddling its thumbs waiting for the db/disk during those 100 seconds. It may well have 30 other tasks that it should be processing, but it can't because it is waiting on blocking I/O. More tasks are coming in from the network (peers, rpc, etc) and they pile up.

async: We get the same RPC call that in aggregate takes 100 secs in the DB (primarily disk i/o). Each call to the DB is followed by await which immediately returns control to the tokio executor. So as the DB begins processing, tokio's executor has begun processing the next pending task, and when it hits an await, the executor moves on to the next one, and so on. The CPU remains utilized whenever a task needs attention and thus no pile-up occurs. [3]

The tokio executor is very similar to an OS scheduler, except that it requires cooperation (via calling await) whereas OS schedulers are pre-emptive and can rip execution away from any thread at any time. Since the tokio executor relies on cooperation (by calling await) its performance degrades rapidly when tasks do not cooperate (by executing for a long time without any call to await).

I'm likely stating things you already know, but I hope there's a helpful nugget in there somewhere. ;-)

[1] This is greatly simplified. In reality, each async fn is an enum or struct with a poll method implementing the Future trait. Each call to await inside async fn myfunc actually sets an internal state var inside myfunc and returns from myfunc::poll back to the tokio executor. The tokio executor processes any pending tasks. When the thing being awaited calls its associated Waker, then the executor knows it should poll myfunc::poll again, which begins processing the next state after the last await was called.

[2] But of course we should minimize db/disk i/o whenever possible, as it is a limited resource.

[3] If all incoming tasks were using the db/disk equally, we could still get a pileup as it is a limited resource. However, assumption is that many tasks do not require db/disk, or much less of it, so we can service those right away. And anyway our constraint would be clearly identifiable as the db/disk.

[bonus] for completeness, I will mention that we do not have to be using async at all. In that case, we would use one regular OS thread for each task (rpc call, peer request, mining loop) and we wouldn't need to worry about cooperative scheduling or blocking tasks. But since the present architecture is async-centric, we should care about those things.

Neptune-Crypto / neptune-core

make storage async, attempt 3. #124