cberner / redb

An embedded key-value database in pure Rust
https://www.redb.org
Apache License 2.0
3.28k stars 153 forks source link

Add async read interface #30

Open cberner opened 3 years ago

cberner commented 3 years ago

Blocked on:

cberner commented 3 years ago

It seems like read() even via IO uring is a lot slower than mmap, so I'm going to close this as won't fix.

Here are IO uring benchmarks: https://github.com/cberner/redb/pull/61

udoprog commented 2 years ago

👋 Thanks for sharing your code!

So async support is one of the first things I look for when new persistent trees pop up in the Rust ecosystem and I'm a bit saddened by finding this issue as wontfix.

Now I/O uring as an I/O interface is rarely going to be "as fast" as using mmap and letting the kernel synchronize memory regions directly with no buffer management and little to no syscall overhead. But async support would be valuable in itself in that it would allow redb to cleanly integrate with the rest of the async ecosystem.

As it stands when integrating redb into a larger async application you'd run the hazard of spurious I/O blocking due to page faults, forcing you to utilize coping mechanisms like a blocking thread pool which comes with its own overhead.

Integrating async I/O uring support comes with its own challenges that I'm sympathetic towards. But I'm at least curious if you could be convinced to reconsider the status of this issue?

cberner commented 2 years ago

For sure, hope you or someone else finds it useful!

Ya, I think that's a good argument for reconsidering an async read interface. I'll take another look into this.

jeromegn commented 1 year ago

We're also interested in this. Not necessarily for the async nature of things, but for the promise of io_uring.

Given mmap is now optional (and not the default) in redb, this could be explored again? In your small benchmark, was io_uring faster than read syscalls?

Coincidently, I've stumbled on an interesting article regarding io_uring and its performance: https://itnext.io/modern-storage-is-plenty-fast-it-is-the-apis-that-are-bad-6a68319fbc1a

That article makes me think the only way to saturate I/O on modern NVMe drives is likely to do multi-threaded reads.

I know next to nothing about these things, just thought I'd ping this issue to see what's the current state of affairs.

cberner commented 1 year ago

notes to self: I looked into implementing this and it's blocked on a couple things at the moment: 1) async in traits: https://github.com/rust-lang/rust/issues/91611 2) Tokio does not support File::read_exact_at() or File::read_seek()

ozgrakkurt commented 1 year ago

Hey @cberner,

cberner commented 1 year ago

The problem with seek and read_seek is that they take &mut self. That would require introducing a lock on the File, or cloning the File for every new transaction

ozgrakkurt commented 1 year ago

Also, they use tokio::spawn_blocking internally for now so it doesn't make much sense to implement this for now I think. User can just do spawn_blocking when interacting with the db and it should be the same

SunDoge commented 1 year ago

In my opinion, the iouring_entries and VALUE_SIZE are both too small for iouring,which doesn't take advantage of the benefits of async I/O. With entries = 64 and value size = 20000, the speed of io_uring is comparable to that of read.

https://gist.github.com/SunDoge/2d361a289b75b7c06607c04e2230add9

lmdb-zero: Loaded 100000 items (1GiB) in 3485ms (547MiB/s)
lmdb: Random read 100000 items in 79ms
lmdb: Random read 100000 items in 36ms
lmdb: Random read 100000 items in 36ms
read()/write(): Loaded 100000 items (1GiB) in 4217ms (452MiB/s)
read()/write(): Random read 100000 items in 86ms
read()/write(): Random read 100000 items in 98ms
read()/write(): Random read 100000 items in 82ms
uring_read()/write(): Loaded 100000 items (1GiB) in 3240ms (589MiB/s)
uring_read()/write(): Random read 100000 items in 136ms
uring_read()/write(): Random read 100000 items in 132ms
uring_read()/write(): Random read 100000 items in 109ms
uring_overlap_read()/write(): Loaded 100000 items (1GiB) in 3112ms (613MiB/s)
uring_overlap_read()/write(): Random read 100000 items in 85ms
uring_overlap_read()/write(): Random read 100000 items in 80ms
uring_overlap_read()/write(): Random read 100000 items in 76ms
mmap(): Loaded 100000 items (1GiB) in 3184ms (599MiB/s)
mmap(): Random read 100000 items in 3ms
mmap(): Random read 100000 items in 2ms
mmap(): Random read 100000 items in 1ms
mmap(ANON): Loaded 100000 items (1GiB) in 683ms (2GiB/s)
mmap(ANON): Random read 100000 items in 2ms
mmap(ANON): Random read 100000 items in 2ms
mmap(ANON): Random read 100000 items in 2ms
vec[]: Loaded 100000 items (1GiB) in 561ms (3GiB/s)
vec[]: Random read 100000 items in 2ms
vec[]: Random read 100000 items in 2ms
vec[]: Random read 100000 items in 2ms

When reading multiple files concurrently, iouring can be even faster. I've implemented tfrecord reader with iouring and it performs really good, with 1.1 GiB/s throughput (1 thread) vs 500 MiB/s sync read (4 threads).

casey commented 10 months ago

I wanted to add some color here. ordinals.com performance has been horrible, and we finally figured out why.

The issue is that we're using an async web framework, so all of our endpoint functions are async. However, those functions then call into redb, which is not async. When those calls are slow, tokio kind of melts, since you have a bunch of async tasks which aren't yielding. If redb supported an async interface, then we could do everything in async, and it wouldn't be a problem.

However, the fix was very simple. We just used tokio::task::spawn_blocking inside of the async functions, and made calls to redb inside of the spawned threads, which are executed on a thread pool. This basically fixed everything, and even if redb had an async interface, we might not even use it, because async is relatively painful, and we would have to convert all of our synchronous index functions which access the database into async. So in our case, it still doesn't make sense, although it may make sense for other use-cases, or it might make sense for use if we run into scaling limits with threads.

jeromegn commented 10 months ago

@casey sounds about right! Have you tried block_in_place? No need to switch tasks / threads in many cases. We use it extensively for SQLite calls.

casey commented 10 months ago

@jeromegn Good suggestion! I didn't know about block_in_place, I'll give that a try. Could this cause issues if there are a bunch of concurrent tasks using block_in_place? I can imagine it could tie up all of tokio's threads used for running async tasks, unless tokio can spawn new threads as needed.

jeromegn commented 10 months ago

@casey the runtime will spawn more threads if the current thread blocks for too long (for some measure of long).

For things that block only for up to 1ms, it's probably not worth it to use block_in_place or spawn_blocking. I assume your use case has longer execution times.

The main benefit of block_in_place is not having to clone or use only Send + Sync + 'static types. You can pass references since it executes closure on the current thread.

dan-da commented 9 months ago

Perhaps redb could offer an async-friendly api that wraps the sync api in spawn-blocking or block-in-place?

dpc commented 9 months ago

Perhaps redb could offer an async-friendly api that wraps the sync api in spawn-blocking or block-in-place?

Is there anything redb would do that a separate crate couldn't? redb-tokio etc.

dan-da commented 9 months ago

Is there anything redb would do that a separate crate couldn't? redb-tokio etc.

I don't know. Probably not. For a second you got me excited thinking there might be a redb-tokio crate already, but anyway such a crate seems like a good way to do it.