performance tuning help need when I issue read on fast NVMe device with example code

gaowayne commented 1 year ago

hello expert.

I change the link_cp code a little to to read on one fast NVMe, its read capabilities are 6000MB/s

with below code I can only reach 2400MB/s, what should be bottleneck?

#include <unistd.h>
#include <fcntl.h>
#include <sys/stat.h>
#include <sys/ioctl.h>
#include <vector>

#include <liburing/io_service.hpp>

#define BS (4096)

static off_t get_file_size(int fd) {
    struct stat st;

    fstat(fd, &st) | uio::panic_on_err("fstat", true);

    if (__builtin_expect(S_ISREG(st.st_mode), true)) {
        return st.st_size;
    }

    if (S_ISBLK(st.st_mode)) {
        unsigned long long bytes;
        ioctl(fd, BLKGETSIZE64, &bytes) | uio::panic_on_err("ioctl", true);
        return bytes;
    }

    throw std::runtime_error("Unsupported file type");
}

uio::task<> readnvme(uio::io_service& service, off_t insize) {
    using uio::on_scope_exit;
    using uio::to_iov;
    using uio::panic_on_err;

    std::vector<char> buf(BS, '\0');
    service.register_buffers({ to_iov(buf.data(), buf.size()) });
    on_scope_exit unreg_bufs([&]() { service.unregister_buffers(); });

    off_t offset = 0;
    for (; offset < insize - BS; offset += BS) {
        service.read_fixed(0, buf.data(), buf.size(), offset, 0, IOSQE_FIXED_FILE ) | panic_on_err("read_fixed(1)", false);
        //service.write_fixed(1, buf.data(), buf.size(), offset, 0, IOSQE_FIXED_FILE | IOSQE_IO_LINK) | panic_on_err("write_fixed(1)", false);
    }

    //int left = insize - offset;
   // service.read_fixed(0, buf.data(), left, offset, 0, IOSQE_FIXED_FILE | IOSQE_IO_LINK) | panic_on_err("read_fixed(2)", false);
    //service.write_fixed(1, buf.data(), left, offset, 0, IOSQE_FIXED_FILE) | panic_on_err("write_fixed(2)", false);
    //co_await service.fsync(1, 0, IOSQE_FIXED_FILE);
}

int main(int argc, char *argv[]) {
    using uio::panic_on_err;
    using uio::on_scope_exit;
    using uio::io_service;

    if (argc < 3) {
        printf("%s: infile outfile\n", argv[0]);
      //  return 1;
    }

    int infd = open(argv[1], O_RDONLY) | panic_on_err("open infile", true);
    on_scope_exit close_infd([=]() { close(infd); });

    off_t insize = get_file_size(infd);
    io_service service;
    service.register_files({ infd });
    on_scope_exit unreg_file([&]() { service.unregister_files(); });

    service.run(readnvme(service, insize));
}

just run above code with link_cp /dev/nvme0n1. then run iostat you will know BW.

CarterLi commented 1 year ago

Seems you were corrupting memory. You must ensure the service.read_fixeds are finished before std::vector<char> buf gets destroyed.

See https://github.com/CarterLi/liburing4cpp#taskhpp

gaowayne commented 1 year ago

Seems you were corrupting memory. You must ensure the service.read_fixeds are finished before std::vector<char> buf gets destroyed.

See https://github.com/CarterLi/liburing4cpp#taskhpp

got it. :) do you have email or wechat, so that can connect offline? :)

CarterLi commented 1 year ago

Just use Github please.

gaowayne commented 1 year ago

Seems you were corrupting memory. You must ensure the service.read_fixeds are finished before std::vector<char> buf gets destroyed.

See https://github.com/CarterLi/liburing4cpp#taskhpp

hello I read your comment and code again. the buf is out of for loop. this buf will not be freed. but I feel you mean that multiple IO write the same memory will corrupt the information. actually it is fine for me. I just want to measure the performance with this framework, currently I do not need worry about data consistent.

I see it is around 2000MB/s when I run over one NVMe SSD that have 6GB/s BW. could you please shed some light how to tune this? :)

CarterLi commented 1 year ago

service.read_fixed(0, buf.data(), buf.size(), offset, 0, IOSQE_FIXED_FILE ) | panic_on_err("read_fixed(1)", false);

This is an async operation, which returns immediately without waiting the I/O to be finished. That is too say, when readnvme returns and buf gets destroyed, there are I/O operations running ( or pending in I/O queue ) in background. Thus use-after-free will occur.

But why does link_cp work?

 off_t offset = 0;
    for (; offset < insize - BS; offset += BS) {
        service.read_fixed(0, buf.data(), buf.size(), offset, 0, IOSQE_FIXED_FILE | IOSQE_IO_LINK) | panic_on_err("read_fixed(1)", false);
        service.write_fixed(1, buf.data(), buf.size(), offset, 0, IOSQE_FIXED_FILE | IOSQE_IO_LINK) | panic_on_err("write_fixed(1)", false);
    }

    int left = insize - offset;
    if (left)
    {
        service.read_fixed(0, buf.data(), left, offset, 0, IOSQE_FIXED_FILE | IOSQE_IO_LINK) | panic_on_err("read_fixed(2)", false);
        service.write_fixed(1, buf.data(), left, offset, 0, IOSQE_FIXED_FILE | IOSQE_IO_LINK) | panic_on_err("write_fixed(2)", false);
    }
    co_await service.fsync(1, 0, IOSQE_FIXED_FILE);

link_cp queues every read / write operations with IOSQE_IO_LINK, which ensures all I/O operations runs in sequence.

For example: READ (1) -> WRITE (2) -> READ (3) -> WRITE (4) -> FSYNC (5)

5 won't start before 4 gets finished; 4 won't start before 3 gets finished... 2 won't start before 1 gets finished.

At the end, we wait for 5 gets finished with co_await service.fsync(1, 0, IOSQE_FIXED_FILE);, so that we can ensure all queued I/O operations are correctly finished before the function returns.

CarterLi commented 1 year ago

Don't talk about performance before you get things correct.

gaowayne commented 1 year ago

service.read_fixed(0, buf.data(), buf.size(), offset, 0, IOSQE_FIXED_FILE ) | panic_on_err("read_fixed(1)", false);

This is an async operation, which returns immediately without waiting the I/O to be finished. That is too say, when readnvme returns and buf gets destroyed, there are I/O operations running ( or pending in I/O queue ) in background. Thus use-after-free will occur.

But why does link_cp work?
 off_t offset = 0;
    for (; offset < insize - BS; offset += BS) {
        service.read_fixed(0, buf.data(), buf.size(), offset, 0, IOSQE_FIXED_FILE | IOSQE_IO_LINK) | panic_on_err("read_fixed(1)", false);
        service.write_fixed(1, buf.data(), buf.size(), offset, 0, IOSQE_FIXED_FILE | IOSQE_IO_LINK) | panic_on_err("write_fixed(1)", false);
    }

    int left = insize - offset;
    if (left)
    {
        service.read_fixed(0, buf.data(), left, offset, 0, IOSQE_FIXED_FILE | IOSQE_IO_LINK) | panic_on_err("read_fixed(2)", false);
        service.write_fixed(1, buf.data(), left, offset, 0, IOSQE_FIXED_FILE | IOSQE_IO_LINK) | panic_on_err("write_fixed(2)", false);
    }
    co_await service.fsync(1, 0, IOSQE_FIXED_FILE);
link_cp queues every read / write operations with IOSQE_IO_LINK, which ensures all I/O operations runs in sequence.

For example: READ (1) -> WRITE (2) -> READ (3) -> WRITE (4) -> FSYNC (5)

5 won't start before 4 gets finished; 4 won't start before 3 gets finished... 2 won't start before 1 gets finished.

At the end, we wait for 5 gets finished with co_await service.fsync(1, 0, IOSQE_FIXED_FILE);, so that we can ensure all queued I/O operations are correctly finished before the function returns.

actually, I already put buf into global variable, it will not get freed during process running. double free bug and use after free bug will cause process crash error. it will not impact the performance I feel.

gaowayne commented 1 year ago

Don't talk about performance before you get things correct.

OK, I will pre-allocate memory buffer for this experiment. but I feel one global buf does not impact the performance result.

CarterLi / liburing4cpp

performance tuning help need when I issue read on fast NVMe device with example code #29