MaRDI4NFDI / portal-compose

docker-composer repo for mardi
https://portal.mardi4nfdi.de
GNU General Public License v3.0
3 stars 1 forks source link

Check I/O performance #447

Closed physikerwelt closed 4 months ago

physikerwelt commented 6 months ago

https://github.com/tool-dockers/docker-iops


docker run --rm -v `pwd`/data:/iops/data tooldockers/iops:85f56cd \
    --randrepeat=1 --ioengine=libaio --direct=1 --gtod_reduce=1 \
    --name=test --filename=test --bs=4k --iodepth=64 --size=4G \
    --readwrite=randrw --rwmixread=75
physikerwelt commented 6 months ago

reference result (my laptop)

test: (g=0): rw=randrw, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=libaio, iodepth=64
fio-3.16
Starting 1 process
test: Laying out IO file (1 file / 4096MiB)

test: (groupid=0, jobs=1): err= 0: pid=25: Thu Dec 21 19:00:01 2023
  read: IOPS=38.6k, BW=151MiB/s (158MB/s)(3070MiB/20337msec)
   bw (  KiB/s): min=118752, max=252744, per=100.00%, avg=154638.03, stdev=38214.33, samples=40
   iops        : min=29688, max=63186, avg=38659.50, stdev=9553.57, samples=40
  write: IOPS=12.9k, BW=50.4MiB/s (52.9MB/s)(1026MiB/20337msec); 0 zone resets
   bw (  KiB/s): min=39976, max=84064, per=100.00%, avg=51683.43, stdev=12800.98, samples=40
   iops        : min= 9994, max=21016, avg=12920.78, stdev=3200.26, samples=40
  cpu          : usr=17.81%, sys=38.07%, ctx=110275, majf=0, minf=8
  IO depths    : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.1%, 32=0.1%, >=64=100.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.1%, >=64=0.0%
     issued rwts: total=785920,262656,0,0 short=0,0,0,0 dropped=0,0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=64

Run status group 0 (all jobs):
   READ: bw=151MiB/s (158MB/s), 151MiB/s-151MiB/s (158MB/s-158MB/s), io=3070MiB (3219MB), run=20337-20337msec
  WRITE: bw=50.4MiB/s (52.9MB/s), 50.4MiB/s-50.4MiB/s (52.9MB/s-52.9MB/s), io=1026MiB (1076MB), run=20337-20337msec
physikerwelt commented 6 months ago

reference data 2 (server at a university in germany)


docker run --rm -v `pwd`/data:/iops/data tooldockers/iops  --randrepeat=1 --ioengine=libaio --direct=1 --gtod_reduce=1     --name=test --filename=test --bs=4k --iodepth=64 --size=4G     --readwrite=randrw --rwmixread=75
test: (g=0): rw=randrw, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=libaio, iodepth=64
fio-3.16
Starting 1 process
test: Laying out IO file (1 file / 4096MiB)

test: (groupid=0, jobs=1): err= 0: pid=73: Thu Dec 21 19:01:50 2023
  read: IOPS=17.6k, BW=68.8MiB/s (72.2MB/s)(3070MiB/44613msec)
   bw (  KiB/s): min=61688, max=78184, per=99.98%, avg=70453.39, stdev=2673.58, samples=89
   iops        : min=15422, max=19546, avg=17613.28, stdev=668.40, samples=89
  write: IOPS=5887, BW=22.0MiB/s (24.1MB/s)(1026MiB/44613msec); 0 zone resets
   bw (  KiB/s): min=20934, max=26216, per=99.99%, avg=23546.46, stdev=988.06, samples=89
   iops        : min= 5233, max= 6554, avg=5886.60, stdev=247.04, samples=89
  cpu          : usr=8.10%, sys=26.39%, ctx=1048606, majf=0, minf=341
  IO depths    : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.1%, 32=0.1%, >=64=100.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.1%, >=64=0.0%
     issued rwts: total=785920,262656,0,0 short=0,0,0,0 dropped=0,0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=64

Run status group 0 (all jobs):
   READ: bw=68.8MiB/s (72.2MB/s), 68.8MiB/s-68.8MiB/s (72.2MB/s-72.2MB/s), io=3070MiB (3219MB), run=44613-44613msec
  WRITE: bw=22.0MiB/s (24.1MB/s), 22.0MiB/s-22.0MiB/s (24.1MB/s-24.1MB/s), io=1026MiB (1076MB), run=44613-44613msec
physikerwelt commented 6 months ago

on the mardi prod server... it is now running for more than 10 minutes...

physikerwelt commented 6 months ago

Interrupting the test


^C
fio: terminating on signal 2

test: (groupid=0, jobs=1): err= 0: pid=9: Thu Dec 21 19:09:03 2023
  read: IOPS=327, BW=1311KiB/s (1343kB/s)(837MiB/653955msec)
   bw (  KiB/s): min=  656, max= 1624, per=99.97%, avg=1310.60, stdev=127.51, samples=1307
   iops        : min=  164, max=  406, avg=327.63, stdev=31.88, samples=1307
  write: IOPS=109, BW=439KiB/s (449kB/s)(280MiB/653955msec); 0 zone resets
   bw (  KiB/s): min=  160, max=  568, per=100.00%, avg=438.40, stdev=48.57, samples=1307
   iops        : min=   40, max=  142, avg=109.53, stdev=12.15, samples=1307
  cpu          : usr=0.23%, sys=1.36%, ctx=299347, majf=0, minf=8
  IO depths    : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.1%, 32=0.1%, >=64=100.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.1%, >=64=0.0%
     issued rwts: total=214365,71711,0,0 short=0,0,0,0 dropped=0,0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=64

Run status group 0 (all jobs):
   READ: bw=1311KiB/s (1343kB/s), 1311KiB/s-1311KiB/s (1343kB/s-1343kB/s), io=837MiB (878MB), run=653955-653955msec
  WRITE: bw=439KiB/s (449kB/s), 439KiB/s-439KiB/s (449kB/s-449kB/s), io=280MiB (294MB), run=653955-653955msec
physikerwelt commented 6 months ago

@timconrad mentioned that he has the feeling that the I/O performance of the server is quite slow. Those tests indicate that this in fact the case. I wonder if we can investigate that further ourself, or if that can be done by the admins of that server?

timconrad commented 6 months ago

Let's schedule a meeting with the ZIB admin people after 5.1.24. @physikerwelt : you can also already open a ticket at ZIB if you want to speed up things =)

physikerwelt commented 6 months ago

Adding another data point From outside docker the performance seems to be reasonable


dd if=/dev/zero of=test.file bs=64M count=16 oflag=dsync
16+0 records in
16+0 records out
1073741824 bytes (1.1 GB, 1.0 GiB) copied, 2.56292 s, 419 MB/s
physikerwelt commented 6 months ago

The other test now completed


docker run --rm -v `pwd`/data:/iops/data tooldockers/iops  --randrepeat=1 --ioengine=libaio --direct=1 --gtod_reduce=1     --name=test --filename=test --bs=4k --iodepth=64 --size=4G     --readwrite=randrw --rwmixread=75
test: (g=0): rw=randrw, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=libaio, iodepth=64
fio-3.16
Starting 1 process
test: Laying out IO file (1 file / 4096MiB)

test: (groupid=0, jobs=1): err= 0: pid=9: Fri Dec 22 09:31:48 2023
  read: IOPS=333, BW=1335KiB/s (1367kB/s)(3070MiB/2354357msec)
   bw (  KiB/s): min=  144, max= 1720, per=100.00%, avg=1335.09, stdev=139.09, samples=4707
   iops        : min=   36, max=  430, avg=333.75, stdev=34.78, samples=4707
  write: IOPS=111, BW=446KiB/s (457kB/s)(1026MiB/2354357msec); 0 zone resets
   bw (  KiB/s): min=   40, max=  608, per=100.00%, avg=446.18, stdev=53.23, samples=4707
   iops        : min=   10, max=  152, avg=111.47, stdev=13.32, samples=4707
  cpu          : usr=0.27%, sys=1.64%, ctx=1060313, majf=0, minf=8
  IO depths    : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.1%, 32=0.1%, >=64=100.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.1%, >=64=0.0%
     issued rwts: total=785920,262656,0,0 short=0,0,0,0 dropped=0,0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=64

Run status group 0 (all jobs):
   READ: bw=1335KiB/s (1367kB/s), 1335KiB/s-1335KiB/s (1367kB/s-1367kB/s), io=3070MiB (3219MB), run=2354357-2354357msec
  WRITE: bw=446KiB/s (457kB/s), 446KiB/s-446KiB/s (457kB/s-457kB/s), io=1026MiB (1076MB), run=2354357-2354357msec
physikerwelt commented 6 months ago

with iotop i figured that most writes (5T in the last 6 days) are done by redis. I thus disabled redis writes

root@e7ff4d4cf582:/data# ls -lah
total 19G
drwxr-xr-x 2 redis redis 4.0K Dec 27 21:24 .
drwxr-xr-x 1 root  root  4.0K Dec 21 11:47 ..
-rw------- 1 redis redis 9.4G Dec 27 21:23 dump.rdb
-rw------- 1 redis redis 9.3G Dec 27 21:26 temp-1468.rdb    
root@e7ff4d4cf582:/data# redis-cli 
127.0.0.1:6379> config set save ""
OK
physikerwelt commented 4 months ago

running read write test again:


mardi-test-user@mardi02:~/portal-compose$ docker run --rm -v `pwd`/data:/iops/data tooldockers/iops  --randrepeat=1 --ioengine=libaio --direct=1 --gtod_reduce=1     --name=test --filename=test --bs=4k --iodepth=64 --size=4G     --readwrite=randrw --rwmixread=75
Unable to find image 'tooldockers/iops:latest' locally
latest: Pulling from tooldockers/iops
cbdbe7a5bc2a: Pull complete 
48df062611ac: Pull complete 
fd6ed5092ed7: Pull complete 
07b295890afb: Pull complete 
3d53a396acba: Pull complete 
Digest: sha256:8658db3c3ce3e93ee4ac250261add127d5c73cae24fb7363a30ab7cf75ba7510
Status: Downloaded newer image for tooldockers/iops:latest
test: (g=0): rw=randrw, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=libaio, iodepth=64
fio-3.16
Starting 1 process
test: Laying out IO file (1 file / 4096MiB)

test: (groupid=0, jobs=1): err= 0: pid=9: Sun Mar  3 19:50:59 2024
  read: IOPS=21.1k, BW=82.5MiB/s (86.5MB/s)(3070MiB/37197msec)
   bw (  KiB/s): min=52816, max=90568, per=100.00%, avg=84830.91, stdev=6391.95, samples=74
   iops        : min=13204, max=22642, avg=21207.70, stdev=1597.99, samples=74
  write: IOPS=7061, BW=27.6MiB/s (28.9MB/s)(1026MiB/37197msec); 0 zone resets
   bw (  KiB/s): min=17928, max=30336, per=100.00%, avg=28349.41, stdev=2077.97, samples=74
   iops        : min= 4482, max= 7584, avg=7087.32, stdev=519.50, samples=74
  cpu          : usr=4.43%, sys=24.76%, ctx=585583, majf=0, minf=8
  IO depths    : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.1%, 32=0.1%, >=64=100.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.1%, >=64=0.0%
     issued rwts: total=785920,262656,0,0 short=0,0,0,0 dropped=0,0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=64

Run status group 0 (all jobs):
   READ: bw=82.5MiB/s (86.5MB/s), 82.5MiB/s-82.5MiB/s (86.5MB/s-86.5MB/s), io=3070MiB (3219MB), run=37197-37197msec
  WRITE: bw=27.6MiB/s (28.9MB/s), 27.6MiB/s-27.6MiB/s (28.9MB/s-28.9MB/s), io=1026MiB (1076MB), run=37197-37197msec
physikerwelt commented 4 months ago

Now the speed is similar to the other server. So writing in general is not the problem, if not too much is written.