DataDog / glommio

Glommio is a thread-per-core crate that makes writing highly parallel asynchronous applications in a thread-per-core architecture easier for rustaceans.
Other
2.99k stars 162 forks source link

why is glommio slower on benchmark against bytedance's monoio? #554

Open hiqsociety opened 2 years ago

hiqsociety commented 2 years ago

why is glommio slower on benchmark against bytedance's monoio? https://github.com/bytedance/monoio

bryandmc commented 2 years ago

Because they are using fastpoll, probably.. at least I think that's why there was faster if I remember correctly. I had an alpha fastpoll branch and we were faster so maybe that's the difference.

We aren't sure when we will have support for faster sockets... it's something I want to do but haven't had time since the birth of my child. In time, we will get it done. Patches welcome!

nazar-pc commented 10 months ago

UPD: It depends on the read pattern, for random reads performance is comparable.

I just did random file read testing of glommio vs monoio. Not only Semaphore from glommio appears to be significantly slower than generic async_lock::Semaphore, monoio was 11.1x faster when reading random ~40kiB chunks across SATA SSD (Samsung SSD 860 EVO 1TB) with io concurrency set to 32.

In my code the abstraction looked like this (requires Rust Nightly):

```rust pub trait ReadAtSync: Send + Sync { /// Fill the buffer by reading bytes at a specific offset fn read_at(&self, buf: &mut [u8], offset: usize) -> io::Result<()>; } pub struct GlommioFile<'a> { file: &'a Rc, semaphore: Semaphore, } impl ReadAtAsync for GlommioFile<'_> { async fn read_at(&self, mut buf: Vec, offset: usize) -> io::Result> { let _permit = self.semaphore.acquire().await; let read_result = self .file .read_at(offset as u64, buf.len()) .await .map_err(|error| { io::Error::new( io::ErrorKind::Other, format!("Failed to read with glommio: {error}"), ) })?; buf.copy_from_slice(&read_result); Ok(buf) } } impl<'a> GlommioFile<'a> { pub fn new(file: &'a Rc, io_concurrency: usize) -> Self { Self { file, semaphore: Semaphore::new(io_concurrency), } } } pub struct MonoioFile<'a> { file: &'a File, semaphore: Semaphore, } impl ReadAtAsync for MonoioFile<'_> { async fn read_at(&self, buf: Vec, offset: usize) -> io::Result> { let _permit = self.semaphore.acquire().await; let (read_result, buf) = self.file.read_exact_at(buf, offset as u64).await; read_result.map(|()| buf) } } impl<'a> MonoioFile<'a> { pub fn new(file: &'a File, io_concurrency: usize) -> Self { Self { file, semaphore: Semaphore::new(io_concurrency), } } } ```

There must be some major low-hanging fruits given such a massive difference.

nyabinary commented 7 months ago

Yeah def needs some performance improvements