fedimint / fedimint

Federated E-Cash Mint
https://fedimint.org/
MIT License
568 stars 219 forks source link

`raw_find_by_prefix` should return a stream instead of an iterator #1322

Closed elsirion closed 1 year ago

elsirion commented 1 year ago

If we want to integrate async DBs (why all the DB functions are async now) streaming data from the DB will probably also be async. Thus a stream fits better than an iterator.

https://github.com/fedimint/fedimint/blob/872b7ead0c32326c2bc56fad119aea2c09a88382/fedimint-api/src/db/mod.rs#L128

https://github.com/fedimint/fedimint/blob/872b7ead0c32326c2bc56fad119aea2c09a88382/fedimint-api/src/db/mod.rs#L60

dpc commented 1 year ago

Theoretically yes but streams are such PITA to work with (as async is half-baked), and it seems even rocksdb doesn't implement it as a stream https://docs.rs/rocksdb/latest/rocksdb/struct.DBIteratorWithThreadMode.html#impl-Iterator-for-DBIteratorWithThreadMode%3C%27a%2C%20D%3E so :shrug: for now? :D

dpc commented 1 year ago

Oh, wait ... https://github.com/fedimint/fedimint/blob/master/fedimint-rocksdb/src/lib.rs#L185 rocksdb doesn't do anything async? Which means we are doing blocking IO in our async code all the time anyway?

m1sterc001guy commented 1 year ago

It looks like Rocksdb does do some asynchronous I/O, but they only expose a synchronous API to the application

From docs:

One way to mitigate the impact of storage latency is to read asynchronously and in parallel as much as possible, in order to hide IO latency. We have implemented this in RocksDB in Iterators and MultiGet. In Iterators, we prefetch data asynchronously in the background for each file being iterated on, unlike the current implementation that does prefetching synchronously, thus blocking the iterator thread. In MultiGet, we determine the set of files that a given batch of keys overlaps, and read the necessary data blocks from those files in parallel using an asynchronous file system API. These optimizations have significantly decreased the overall latency of the RocksDB MultiGet and iteration APIs on slower storage compared to local flash.

The optimizations described here are in the internal implementation of Iterator and MultiGet in RocksDB. The user API is still synchronous, so existing code can easily benefit from it. We might consider async user APIs in the future.

dpc commented 1 year ago

@m1sterc001guy Oh dear. We should wrap everything in spawn_blocking, but I be this is going to conflict with lifetimes and such.

We might need to move the whole database handling to threadpool. Each time we create a transaction, we spawn a normal blocking thread, make a channela, and put the sending part into a IDatabaseTransaction. So then making actual db operation is just an rpc to a "transaction thread". Transaction thread is just a loop over the rpc receiving end that does the operation and sends back the result. This the user can block on responses over channels and is actually async.

We might maintain a threadpool to avoid the cost of spawning a threads over and over (it's not very costly, but not free either).

m1sterc001guy commented 1 year ago

Sure, I can get started on that. I think we would only need to do this for Rocksdb, since we're planning on removing Sled and Sqlite does support an async API.

dpc commented 1 year ago

@m1sterc001guy Probably it is not a priority RN, and module isolation would be more useful, but yeah, we should do it, otherwise Fedimint is going to be potentially slowish.

It should be a implementation detail of fedimintd-rocksdb so rest of the code doesn't need to know. (: