CramBL / quick-file-transfer

Simple file transfer utility with the sole purpose of transferring files as quickly, safely, and painlessly as possible
https://crambl.github.io/quick-file-transfer/
MIT License
3 stars 1 forks source link

Performance investigation #41

Open CramBL opened 1 month ago

CramBL commented 1 month ago

Experienced while transferring from a Raspberry PI 3B to a Zynq-7000 (32-bit arm)

Especially bad on a slow network where scp's performance also degraded significantly.

CramBL commented 1 month ago

Test on Raspberry Pi 5 with Ubuntu-server 23.10

On Raspberry Pi 5: flamegraph --root -o listen_fgph.svg -- sudo /usr/sbin/sshd -D -p 54321 On Desktop Pop OS 22.04: scp 400mb.data pi@192.168.0.200:~/tmp/

Note that most of the late half of the daemon invocation is just idle time after the transfer completes

sshd_listen_fpgh

On Raspberry Pi 5: flamegraph -o listen_fgph.svg -- usr/bin/qft listen --port 54321 -o test.data On Desktop Pop OS 22.04: qft send ip 192.168.0.200 --port 54321 -f 400mb.data

listen_fgph

Once more

qftssh_listen_fpgh

with lz4: qft listen --port 54321 -o test.data -d lz4

qft_listen_fpgh3

Take aways

It seems scp exits before the transfer is complete while qft wait for the server to OK end of transfer, which explains some of the discrepancy.

The disk IO (SD card) is the bottleneck based on the flamegraphs showing most of the time being spend in generic_perform_write .

What now?

More profiling on transfers between MCUs instead of on a Desktop to optimize for disk IO both on the sender and receiver.

CramBL commented 1 month ago

Transfer from Raspberry pi 5 (ubuntu server 23.10) -> Raspberry Pi Zero W (Raspbian lite)

qft is significantly faster than scp for this 94mb json file.

pi@rpi5:~$ time scp 94mb.json alice@192.168.0.234:~/
94mb.json                                                                                                                                                                
real    0m25.565s
user    0m0.503s
sys     0m0.433s
pi@rpi5:~$ time scp 94mb.json alice@192.168.0.234:~/

real    0m23.280s
user    0m0.493s
sys     0m0.421s
pi@rpi5:~$ time qft ssh 94mb.json alice@192.168.0.234:~/

real    0m17.328s
user    0m4.995s
sys     0m11.831s

pi@rpi5:~$ time qft ssh 94mb.json alice@192.168.0.234:~/

real    0m18.261s
user    0m5.234s
sys     0m12.525s

qft is significantly faster than scp for this 394mb json file.

time scp 394mb.json alice@192.168.0.234:~/
394mb.json

real    1m49.440s
user    0m2.004s
sys     0m1.186s
time qft ssh 394mb.json alice@192.168.0.234:~/

real    1m16.037s
user    0m22.163s
sys     0m53.494s
time rsync 394mb.json alice@192.168.0.234:~/bla.json

real    1m49.678s
user    0m1.944s
sys     0m0.932s

Hyperfine from RPI5 -> RPI0W

hyperfine "qft ssh -q 94mb.json alice@192.168.0.234:~/" "scp -q 94mb.json alice@192.168.0.234:~/" --export-markdown bench_94mb.m

image

Command Mean [s] Min [s] Max [s] Relative
qft ssh -q 94mb.json alice@192.168.0.234:~/ 19.172 ± 3.453 15.593 28.252 1.00
scp -q 94mb.json alice@192.168.0.234:~/ 24.929 ± 2.170 22.629 28.436 1.30 ± 0.26

Conclusion

Not bad for qft.

CramBL commented 1 month ago

Transfer from Raspberry Pi Zero W (Raspbian lite) -> Raspberry pi 5 (ubuntu server 23.10)

scp is slightly faster for 94mb.json

time scp 94mb.json pi@192.168.0.200:~/

real    0m22.450s
user    0m7.481s
sys     0m13.186s
real    0m21.835s
user    0m7.870s
sys     0m13.151s
time qft ssh 94mb.json pi@192.168.0.200:~/

real    0m23.296s
user    0m6.028s
sys     0m15.434s

real    0m23.490s
user    0m6.976s
sys     0m14.965s

Flamegraph from RPI5 listening with qft listen

qftlisten_pi5pi0

Transfer 394mb.json

qft ssh 94mb.json pi@192.168.0.200:~/

real    1m33.412s
user    0m26.302s
sys     1m0.735s
real    1m34.077s
user    0m25.386s
sys     1m1.792s
real    1m33.451s
user    0m32.473s
sys     0m53.082s

qft ssh is slightly faster for 394mb.json but really the difference is insignificant

time scp -q 394mb.json pi@192.168.0.200:~/

real    1m37.472s
user    0m33.170s
sys     0m52.426s
real    1m34.407s
user    0m33.463s
sys     0m52.478s
real    1m34.007s
user    0m27.359s
sys     1m0.803s
CramBL commented 1 month ago

On a Raspberry PI 4B transferring 500 MB rauc bundle to a Zynq-7000 arm 32-bit with qft v0.10.1 the results are decent:

time qft ssh test.raucb root@hostname.local.:~/

real    0m58.592s
user    0m16.036s
sys     0m42.955s

time scp test.raucb root@hostname.local.:~/

real    1m7.451s
user    0m6.639s
sys     0m4.616s
CramBL commented 1 month ago

RPI5 -> RPI Zero W with varied TCP_STREAM_BUFSIZE

Varying the TCP_STREAM_BUFSIZE on both hosts, the suffix to qft is the factor 1024 was multiplied with to determine the TCP_STREAM_BUFSIZE. e.g. qft8 has a TCP_STREAM_BUFSIZE of 8*1024 bytes.

Benchmark 1: qft1 ssh cargolock.data alice@pi.hole:~/
  Time (mean ± σ):      9.895 s ±  1.213 s    [User: 2.743 s, System: 6.428 s]
  Range (min … max):    6.670 s … 10.814 s    10 runs

Benchmark 2: qft2 ssh cargolock.data alice@pi.hole:~/
  Time (mean ± σ):      9.454 s ±  0.194 s    [User: 2.612 s, System: 6.099 s]
  Range (min … max):    9.180 s …  9.710 s    10 runs

Benchmark 3: qft3 ssh cargolock.data alice@pi.hole:~/
  Time (mean ± σ):      9.765 s ±  0.531 s    [User: 2.597 s, System: 6.071 s]
  Range (min … max):    9.241 s … 10.808 s    10 runs

Benchmark 4: qft4 ssh cargolock.data alice@pi.hole:~/
  Time (mean ± σ):      9.726 s ±  0.503 s    [User: 2.664 s, System: 6.286 s]
  Range (min … max):    9.098 s … 10.717 s    10 runs

Benchmark 5: qft8 ssh cargolock.data alice@pi.hole:~/
  Time (mean ± σ):      9.333 s ±  0.277 s    [User: 2.618 s, System: 5.971 s]
  Range (min … max):    8.926 s …  9.859 s    10 runs

Benchmark 6: qft16 ssh cargolock.data alice@pi.hole:~/
  Time (mean ± σ):      9.361 s ±  0.245 s    [User: 2.583 s, System: 6.029 s]
  Range (min … max):    9.050 s …  9.725 s    10 runs

Summary
  qft8 ssh cargolock.data alice@pi.hole:~/ ran
    1.00 ± 0.04 times faster than qft16 ssh cargolock.data alice@pi.hole:~/
    1.01 ± 0.04 times faster than qft2 ssh cargolock.data alice@pi.hole:~/
    1.04 ± 0.06 times faster than qft4 ssh cargolock.data alice@pi.hole:~/
    1.05 ± 0.06 times faster than qft3 ssh cargolock.data alice@pi.hole:~/
    1.06 ± 0.13 times faster than qft1 ssh cargolock.data alice@pi.hole:~/
CramBL commented 1 month ago

First attempt at io_uring: RPI5 -> RPI0

Raucbundle of 458MB.

Not bad! This batches file reads in 10 batches of 8192 bytes each. The TCP writes are not batched but they do use io_uring.

The first transfer is completely cold and still outperforms the second transfer. Quite a change considering the bottleneck is definitely the RPI0's TCP and file IO. Unfortunately the RPI does not support io_uring.

pi@rpi5:~$ time ./qft ssh /tmp/test.raucb alice@pi.hole:/tmp --io-uring
INFO Sending 1 file(s)
INFO Using io_uring 🚀
INFO Sent /tmp/test.raucb 457.13 MiB [479335834 B]

real    1m20.959s
user    0m24.134s
sys     0m57.551s
pi@rpi5:~$ uname -r
6.5.0-1020-raspi

pi@rpi5:~$ time ./qft ssh /tmp/test.raucb alice@pi.hole:/tmp
INFO Sending 1 file(s)
INFO Sent /tmp/test.raucb 457.13 MiB [479335834 B]

real    1m27.113s
user    0m25.585s
sys     1m0.767s

pi@rpi5:~$ time scp /tmp/test.raucb alice@pi.hole:/tmp
test.raucb                                                                                                                                                                      100%  457MB   3.8MB/s   02:00

real    2m1.584s
user    0m2.364s
sys     0m1.320s

RPI5 -> Desktop to inspect impact of io_uring

perf stat qft send ip 192.168.0.181 --port 54321 -f /tmp/test.raucb --io-uring
INFO Sending 1 file(s)
INFO Using io_uring 🚀
INFO Sent /tmp/test.raucb 457.13 MiB [479335834 B]

 Performance counter stats for 'qft send ip 192.168.0.181 --port 54321 -f /tmp/test.raucb --io-uring':

            999.34 msec task-clock                       #    0.238 CPUs utilized
             25722      context-switches                 #   25.739 K/sec
                12      cpu-migrations                   #   12.008 /sec
               275      page-faults                      #  275.182 /sec
        1599109722      cycles                           #    1.600 GHz
        2383469888      instructions                     #    1.49  insn per cycle
         442937654      branches                         #  443.230 M/sec
           1886663      branch-misses                    #    0.43% of all branches

       4.201624551 seconds time elapsed

       0.396254000 seconds user
       0.650373000 seconds sys
perf stat qft send ip 192.168.0.181 --port 54321 -f /tmp/test.raucb
INFO Sending 1 file(s)
INFO Sent /tmp/test.raucb 457.13 MiB [479335834 B]

 Performance counter stats for 'qft send ip 192.168.0.181 --port 54321 -f /tmp/test.raucb':

            372.24 msec task-clock                       #    0.088 CPUs utilized
               585      context-switches                 #    1.572 K/sec
                 3      cpu-migrations                   #    8.059 /sec
               210      page-faults                      #  564.153 /sec
         594685363      cycles                           #    1.598 GHz
         789440136      instructions                     #    1.33  insn per cycle
         110214995      branches                         #  296.086 M/sec
            342408      branch-misses                    #    0.31% of all branches

       4.253325202 seconds time elapsed

       0.058846000 seconds user
       0.313847000 seconds sys

With perf stat --repeat=10

Regular

Performance counter stats for 'qft send ip 192.168.0.181 --port 54321 -f /tmp/test.raucb' (10 runs):

            358.28 msec task-clock                       #    0.084 CPUs utilized               ( +-  1.55% )
               555      context-switches                 #    1.549 K/sec                       ( +-  0.95% )
                 1      cpu-migrations                   #    2.791 /sec                        ( +- 31.45% )
               210      page-faults                      #  586.130 /sec                        ( +-  0.05% )
         573778347      cycles                           #    1.601 GHz                         ( +-  1.60% )
         789707298      instructions                     #    1.38  insn per cycle              ( +-  0.72% )
         110284347      branches                         #  307.814 M/sec                       ( +-  1.06% )
            344538      branch-misses                    #    0.31% of all branches             ( +-  5.74% )

            4.2875 +- 0.0281 seconds time elapsed  ( +-  0.65% )

Io_uring

Performance counter stats for 'qft send ip 192.168.0.181 --port 54321 -f /tmp/test.raucb --io-uring' (10 runs):

            993.93 msec task-clock                       #    0.232 CPUs utilized               ( +-  0.44% )
             25708      context-switches                 #   25.865 K/sec                       ( +-  0.33% )
                12      cpu-migrations                   #   12.073 /sec                        ( +- 13.96% )
               277      page-faults                      #  278.691 /sec                        ( +-  0.21% )
        1588412005      cycles                           #    1.598 GHz                         ( +-  0.84% )
        2393716992      instructions                     #    1.51  insn per cycle              ( +-  0.46% )
         445005094      branches                         #  447.722 M/sec                       ( +-  0.50% )
           1971949      branch-misses                    #    0.44% of all branches             ( +-  3.67% )

            4.2919 +- 0.0116 seconds time elapsed  ( +-  0.27% )
CramBL commented 1 month ago

More io-uring testing with RPI5 -> Host with 400+ MB rauc bundle

No io_uring

No cpu migrations and few context-switches, needs well-tuned non-blocking io to beat this.

 Performance counter stats for 'qft send ip 192.168.0.181 --port 54321 -f /tmp/test.raucb -q' (10 runs):

            342.07 msec task-clock                       #    0.073 CPUs utilized               ( +-  0.97% )
               572      context-switches                 #    1.672 K/sec                       ( +-  1.32% )
                 0      cpu-migrations                   #    0.000 /sec
               229      page-faults                      #  669.452 /sec
         544408042      cycles                           #    1.592 GHz                         ( +-  1.00% )
         780091073      instructions                     #    1.43  insn per cycle              ( +-  0.56% )
         108305757      branches                         #  316.618 M/sec                       ( +-  0.83% )
            335093      branch-misses                    #    0.31% of all branches             ( +-  4.92% )

             4.655 +- 0.162 seconds time elapsed  ( +-  3.49% )

io_uring by submitting file read batches and submitting them all before writing them one by one to the TCP socket

Lots of CPU migrations and context switches

Performance counter stats for 'qft send ip 192.168.0.181 --port 54321 -f /tmp/test.raucb -q --io-uring' (10 runs):

            624.08 msec task-clock                       #    0.143 CPUs utilized               ( +-  0.56% )
             12212      context-switches                 #   19.568 K/sec                       ( +-  0.24% )
                 7      cpu-migrations                   #   11.217 /sec                        ( +- 23.18% )
               294      page-faults                      #  471.096 /sec                        ( +-  0.24% )
         980788675      cycles                           #    1.572 GHz                         ( +-  0.77% )
        1372125800      instructions                     #    1.40  insn per cycle              ( +-  0.59% )
         234772016      branches                         #  376.191 M/sec                       ( +-  0.71% )
           1082223      branch-misses                    #    0.46% of all branches             ( +-  4.98% )

            4.3620 +- 0.0959 seconds time elapsed  ( +-  2.20% )

io_uring by submitting a batch and then getting one read at a time and writing it into the TCP socket before submitting one new read and repeat

The buffer size and batch count is indicated by the binary name in the pattern qft_<BUFSIZE>_<BATCH_COUNT>.

i.e. qft_4k_32 uses a uring depth of 32 with a slice/buffer size of 4096 bytes.

 Performance counter stats for 'qft_4k_16 send ip 192.168.0.181 --port 54321 -f /tmp/test.raucb -q --io-uring' (10 runs):

           2920.23 msec task-clock                       #    0.687 CPUs utilized               ( +-  0.09% )
            234145      context-switches                 #   80.180 K/sec                       ( +-  0.06% )
               228      cpu-migrations                   #   78.076 /sec                        ( +-  2.27% )
               248      page-faults                      #   84.925 /sec                        ( +-  0.29% )
        4857468621      cycles                           #    1.663 GHz                         ( +-  0.23% )
        7379999588      instructions                     #    1.52  insn per cycle              ( +-  0.10% )
        1523850908      branches                         #  521.825 M/sec                       ( +-  0.10% )
           9703408      branch-misses                    #    0.64% of all branches             ( +-  1.03% )

            4.2487 +- 0.0210 seconds time elapsed  ( +-  0.49% )
 Performance counter stats for 'qft_4k_32 send ip 192.168.0.181 --port 54321 -f /tmp/test.raucb -q --io-uring' (10 runs):

           2932.89 msec task-clock                       #    0.679 CPUs utilized               ( +-  0.10% )
            234405      context-switches                 #   79.923 K/sec                       ( +-  0.08% )
               230      cpu-migrations                   #   78.421 /sec                        ( +-  1.14% )
               268      page-faults                      #   91.377 /sec                        ( +-  0.18% )
        4861247035      cycles                           #    1.657 GHz                         ( +-  0.31% )
        7362958065      instructions                     #    1.51  insn per cycle              ( +-  0.16% )
        1520494682      branches                         #  518.429 M/sec                       ( +-  0.15% )
           9641872      branch-misses                    #    0.63% of all branches             ( +-  1.05% )

            4.3167 +- 0.0373 seconds time elapsed  ( +-  0.87%
 Performance counter stats for 'qft_4k_64 send ip 192.168.0.181 --port 54321 -f /tmp/test.raucb -q --io-uring' (10 runs):

           2948.66 msec task-clock                       #    0.678 CPUs utilized               ( +-  0.08% )
            234115      context-switches                 #   79.397 K/sec                       ( +-  0.06% )
               236      cpu-migrations                   #   80.036 /sec                        ( +-  1.56% )
               304      page-faults                      #  103.098 /sec                        ( +-  0.23% )
        4922522759      cycles                           #    1.669 GHz                         ( +-  0.27% )
        7392280292      instructions                     #    1.50  insn per cycle              ( +-  0.12% )
        1526417610      branches                         #  517.664 M/sec                       ( +-  0.11% )
          10003319      branch-misses                    #    0.66% of all branches             ( +-  0.83% )

            4.3501 +- 0.0393 seconds time elapsed  ( +-  0.90% )
Performance counter stats for 'qft_4k_64 send ip 192.168.0.181 --port 54321 -f /tmp/test.raucb -q --io-uring' (10 runs):

           2944.85 msec task-clock                       #    0.687 CPUs utilized               ( +-  0.11% )
            234216      context-switches                 #   79.534 K/sec                       ( +-  0.06% )
               232      cpu-migrations                   #   78.781 /sec                        ( +-  1.29% )
               304      page-faults                      #  103.231 /sec                        ( +-  0.11% )
        4909014114      cycles                           #    1.667 GHz                         ( +-  0.27% )
        7384111146      instructions                     #    1.50  insn per cycle              ( +-  0.10% )
        1524792489      branches                         #  517.782 M/sec                       ( +-  0.10% )
           9898644      branch-misses                    #    0.65% of all branches             ( +-  0.94% )

            4.2884 +- 0.0288 seconds time elapsed  ( +-  0.67% )
Performance counter stats for 'qft_4k_128 send ip 192.168.0.181 --port 54321 -f /tmp/test.raucb -q --io-uring' (10 runs):

           2976.79 msec task-clock                       #    0.691 CPUs utilized               ( +-  0.08% )
            234439      context-switches                 #   78.756 K/sec                       ( +-  0.07% )
               244      cpu-migrations                   #   81.967 /sec                        ( +-  2.17% )
               379      page-faults                      #  127.318 /sec                        ( +-  0.08% )
        4969488126      cycles                           #    1.669 GHz                         ( +-  0.48% )
        7425788983      instructions                     #    1.49  insn per cycle              ( +-  0.33% )
        1533401150      branches                         #  515.119 M/sec                       ( +-  0.32% )
          10298570      branch-misses                    #    0.67% of all branches             ( +-  1.20% )

            4.3066 +- 0.0350 seconds time elapsed  ( +-  0.81% )
Performance counter stats for 'qft_4k_256 send ip 192.168.0.181 --port 54321 -f /tmp/test.raucb -q --io-uring' (10 runs):

           3041.61 msec task-clock                       #    0.712 CPUs utilized               ( +-  0.30% )
            234150      context-switches                 #   76.982 K/sec                       ( +-  0.07% )
               230      cpu-migrations                   #   75.618 /sec                        ( +-  2.24% )
               524      page-faults                      #  172.277 /sec                        ( +-  0.11% )
        5174027620      cycles                           #    1.701 GHz                         ( +-  0.60% )
        7419969792      instructions                     #    1.43  insn per cycle              ( +-  0.14% )
        1532175650      branches                         #  503.739 M/sec                       ( +-  0.14% )
          10165183      branch-misses                    #    0.66% of all branches             ( +-  1.16% )

            4.2696 +- 0.0345 seconds time elapsed  ( +-  0.81% )
CramBL commented 1 month ago

metrics_comparison

CramBL commented 1 month ago

With this io_uring implementation and 458M rauc bundle

pub fn incremental_rw_io_uring_batch_alt2<const BUFSIZE: usize, const BATCH_COUNT: usize>(
    socket: &TcpStream,
    file_path: &Path,
) -> anyhow::Result<u64> {
    let file = File::open(file_path)?;
    let buffers = [[0u8; BUFSIZE]; BATCH_COUNT];
    let mut total_read: u64 = 0;
    let mut total_write: u64 = 0;

    let config = rio::Config {
        depth: BATCH_COUNT,
        ..Default::default()
    };
    let uring = config.start()?;
    let mut read_futures = VecDeque::with_capacity(BATCH_COUNT);

    // Initialize read futures
    for batch_idx in 0..BATCH_COUNT {
        let file_offset = batch_idx as u64 * BUFSIZE as u64;
        let read_future = uring.read_at(&file, &buffers[batch_idx], file_offset);
        read_futures.push_back((batch_idx, read_future));
    }

    while let Some((idx, read_future)) = read_futures.pop_front() {
        let bytes_read = read_future.wait()?;
        if bytes_read == 0 {
            break; // EOF
        }
        total_read += bytes_read as u64;

        let buf = &buffers[idx][..bytes_read];
        total_write += write_at_socket(&uring, socket, buf, bytes_read)? as u64;

        debug_assert_eq!(total_read, total_write);

        // Re-issue the read for the same buffer
        let file_offset = total_read + ((BUFSIZE as u64 * BATCH_COUNT as u64) - BUFSIZE as u64);
        read_futures.push_back((idx, uring.read_at(&file, &buffers[idx], file_offset)));
    }

    assert_eq!(
        total_read, total_write,
        "Total bytes read does not equal total bytes written"
    );

    Ok(total_read)
}

#[inline(always)]
fn write_at_socket(
    uring: &Rio,
    socket: &TcpStream,
    buf: &[u8],
    write_count: usize,
) -> std::io::Result<usize> {
    let mut total_written = 0;
    while total_written < write_count {
        let tmp_write_buf = &buf[total_written..];
        let written_bytes = uring.send(socket, &tmp_write_buf).wait()?;
        if written_bytes == 0 {
            return Err(io::Error::new(
                io::ErrorKind::WriteZero,
                "Failed writing any bytes to the socket",
            ));
        }
        total_written += written_bytes;
    }
    Ok(total_written)
}

metrics_comparison

CramBL commented 1 month ago

Same as above with 94MB json

metrics_comparison

CramBL commented 1 month ago

A couple of medium sized files

metrics_comparison

CramBL commented 1 month ago
Command Mean [s] Min [s] Max [s] Relative
qft send ip 192.168.0.181 --port 54321 -f /tmp/test.raucb -q 4.215 ± 0.035 4.183 4.289 1.00 ± 0.01
qft_2k_16 send ip 192.168.0.181 --port 54321 -f /tmp/test.raucb -q --io-uring 4.339 ± 0.138 4.191 4.634 1.03 ± 0.03
qft_2k_128 send ip 192.168.0.181 --port 54321 -f /tmp/test.raucb -q --io-uring 4.410 ± 0.117 4.194 4.597 1.05 ± 0.03
qft_4k_16 send ip 192.168.0.181 --port 54321 -f /tmp/test.raucb -q --io-uring 4.300 ± 0.152 4.189 4.657 1.02 ± 0.04
qft_4k_32 send ip 192.168.0.181 --port 54321 -f /tmp/test.raucb -q --io-uring 4.228 ± 0.050 4.181 4.300 1.01 ± 0.01
qft_4k_64 send ip 192.168.0.181 --port 54321 -f /tmp/test.raucb -q --io-uring 4.199 ± 0.015 4.186 4.240 1.00
qft_4k_128 send ip 192.168.0.181 --port 54321 -f /tmp/test.raucb -q --io-uring 4.229 ± 0.085 4.176 4.451 1.01 ± 0.02
qft_4k_256 send ip 192.168.0.181 --port 54321 -f /tmp/test.raucb -q --io-uring 4.224 ± 0.053 4.173 4.312 1.01 ± 0.01
qft_8k_16 send ip 192.168.0.181 --port 54321 -f /tmp/test.raucb -q --io-uring 4.255 ± 0.065 4.190 4.385 1.01 ± 0.02
qft_8k_32 send ip 192.168.0.181 --port 54321 -f /tmp/test.raucb -q --io-uring 4.215 ± 0.038 4.187 4.288 1.00 ± 0.01
qft_8k_64 send ip 192.168.0.181 --port 54321 -f /tmp/test.raucb -q --io-uring 4.214 ± 0.046 4.172 4.288 1.00 ± 0.01
qft_8k_128 send ip 192.168.0.181 --port 54321 -f /tmp/test.raucb -q --io-uring 4.288 ± 0.090 4.190 4.453 1.02 ± 0.02
qft_8k_256 send ip 192.168.0.181 --port 54321 -f /tmp/test.raucb -q --io-uring 4.270 ± 0.139 4.181 4.549 1.02 ± 0.03
qft_16k_8 send ip 192.168.0.181 --port 54321 -f /tmp/test.raucb -q --io-uring 4.231 ± 0.059 4.183 4.358 1.01 ± 0.01
qft_16k_16 send ip 192.168.0.181 --port 54321 -f /tmp/test.raucb -q --io-uring 4.246 ± 0.131 4.171 4.606 1.01 ± 0.03
qft_16k_32 send ip 192.168.0.181 --port 54321 -f /tmp/test.raucb -q --io-uring 4.221 ± 0.043 4.189 4.286 1.01 ± 0.01
qft_16k_64 send ip 192.168.0.181 --port 54321 -f /tmp/test.raucb -q --io-uring 4.237 ± 0.103 4.187 4.523 1.01 ± 0.02
qft_16k_128 send ip 192.168.0.181 --port 54321 -f /tmp/test.raucb -q --io-uring 4.277 ± 0.141 4.177 4.619 1.02 ± 0.03
qft_16k_256 send ip 192.168.0.181 --port 54321 -f /tmp/test.raucb -q --io-uring 4.242 ± 0.062 4.188 4.371 1.01 ± 0.02