Closed SadeghTb closed 2 years ago
1 MiB sample result :
4KiB_write: (g=0): rw=write, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=psync, iodepth=1
4KiB_read: (g=0): rw=read, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=psync, iodepth=1
fio-3.16
Starting 2 processes
4KiB_write: Laying out IO file (1 file / 1MiB)
4KiB_read: Laying out IO file (1 file / 1MiB)
4KiB_write: (groupid=0, jobs=2): err= 0: pid=1621: Sat Apr 9 17:06:41 2022
read: IOPS=1753, BW=7014KiB/s (7182kB/s)(1024KiB/146msec)
clat (nsec): min=1800, max=33275k, avg=552421.08, stdev=2065224.15
lat (nsec): min=1869, max=33276k, avg=552658.18, stdev=2065271.34
clat percentiles (usec):
| 1.00th=[ 46], 5.00th=[ 227], 10.00th=[ 273], 20.00th=[ 306],
| 30.00th=[ 343], 40.00th=[ 371], 50.00th=[ 392], 60.00th=[ 420],
| 70.00th=[ 453], 80.00th=[ 490], 90.00th=[ 570], 95.00th=[ 693],
| 99.00th=[ 1647], 99.50th=[ 2606], 99.90th=[33162], 99.95th=[33162],
| 99.99th=[33162]
write: IOPS=522, BW=2090KiB/s (2140kB/s)(1024KiB/490msec); 0 zone resets
clat (usec): min=215, max=105894, avg=1387.33, stdev=9403.43
lat (usec): min=215, max=105895, avg=1387.69, stdev=9403.47
clat percentiles (usec):
| 1.00th=[ 231], 5.00th=[ 281], 10.00th=[ 289], 20.00th=[ 322],
| 30.00th=[ 338], 40.00th=[ 367], 50.00th=[ 388], 60.00th=[ 416],
| 70.00th=[ 437], 80.00th=[ 482], 90.00th=[ 578], 95.00th=[ 840],
| 99.00th=[ 33817], 99.50th=[103285], 99.90th=[105382], 99.95th=[105382],
| 99.99th=[105382]
lat (usec) : 2=0.20%, 10=0.20%, 50=0.20%, 250=3.91%, 500=77.73%
lat (usec) : 750=12.89%, 1000=1.56%
lat (msec) : 2=1.76%, 4=0.78%, 50=0.39%, 250=0.39%
cpu : usr=0.00%, sys=4.42%, ctx=1112, majf=0, minf=24
IO depths : 1=100.2%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
issued rwts: total=256,256,0,1 short=0,0,0,0 dropped=0,0,0,0
latency : target=0, window=0, percentile=100.00%, depth=1
Run status group 0 (all jobs):
READ: bw=7014KiB/s (7182kB/s), 7014KiB/s-7014KiB/s (7182kB/s-7182kB/s), io=1024KiB (1049kB), run=146-146msec
WRITE: bw=2090KiB/s (2140kB/s), 2090KiB/s-2090KiB/s (2140kB/s-2140kB/s), io=1024KiB (1049kB), run=490-490msec
10MiB sample result:
4KiB_write: (g=0): rw=write, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=psync, iodepth=1
4KiB_read: (g=0): rw=read, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=psync, iodepth=1
fio-3.16
Starting 2 processes
4KiB_write: Laying out IO file (1 file / 10MiB)
4KiB_read: Laying out IO file (1 file / 10MiB)
^C^C
fio: terminating on signal 2
Run status group 0 (all jobs):
Nice detail in the issue, SadeghTb. I believe the published fio.conf file you specify 100Mi size, rather than the 1Mi or 10Mi, correct?
I ran with each of the size= settings of 1Mi, 10Mi, and 100Mi... and it worked, in my container setup just fine. I will say that, running in the containerized environment (currently), the lowly 10MiB/sec does mean the test pauses for a bit while it lays out the 100MiB file... and then takes 10ish seconds to complete:
=====
4KiB_write: (g=0): rw=write, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=psync, iodepth=1 4KiB_read: (g=0): rw=read, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=psync, iodepth=1 fio-3.28 Starting 2 processes 4KiB_write: Laying out IO file (1 file / 100MiB) 4KiBread: Laying out IO file (1 file / 100MiB) Jobs: 2 (f=2): [W(1),R(1)][15.0%][r=11.7MiB/s,w=6956KiB/s][r=2996,w=1739 IOPS][eJobs: 2 (f=2): [W(1),R(1)][23.5%][r=9722KiB/s,w=8247KiB/s][r=2430,w=2061 IOPS][eJobs: 2 (f=2): [W(1),R(1)][31.2%][r=13.0MiB/s,w=7492KiB/s][r=3323,w=1873 IOPS][eJobs: 2 (f=2): [W(1),R(1)][40.0%][r=10.3MiB/s,w=7808KiB/s][r=2641,w=1952 IOPS][eJobs: 2 (f=2): [W(1),R(1)][46.7%][r=9100KiB/s,w=7480KiB/s][r=2275,w=1870 IOPS][eJobs: 2 (f=2): [W(1),R(1)][53.3%][r=9502KiB/s,w=7272KiB/s][r=2375,w=1818 IOPS][eJobs: 2 (f=2): [W(1),R(1)][60.0%][r=12.9MiB/s,w=7480KiB/s][r=3291,w=1870 IOPS][eJobs: 1 (f=1): [W(1),(1)][66.7%][r=12.4MiB/s,w=7444KiB/s][r=3173,w=1861 IOPS][eJobs: 1 (f=1): [W(1),(1)][78.6%][w=9285KiB/s][w=2321 IOPS][eta 00m:03s] Jobs: 1 (f=1): [W(1),(1)][100.0%][w=8960KiB/s][w=2240 IOPS][eta 00m:00s] 4KiB_write: (groupid=0, jobs=2): err= 0: pid=3191: Sun Apr 10 06:03:07 2022 read: IOPS=2874, BW=11.2MiB/s (11.8MB/s)(100MiB/8906msec) clat (nsec): min=474, max=83337k, avg=346269.03, stdev=1326771.80 lat (nsec): min=497, max=83338k, avg=346427.62, stdev=1326782.20 clat percentiles (nsec): | 1.00th=[ 498], 5.00th=[ 516], 10.00th=[ 540], | 20.00th=[ 612], 30.00th=[ 31616], 40.00th=[ 220160], | 50.00th=[ 309248], 60.00th=[ 378880], 70.00th=[ 436224], | 80.00th=[ 493568], 90.00th=[ 561152], 95.00th=[ 643072], | 99.00th=[ 1011712], 99.50th=[ 1843200], 99.90th=[11599872], | 99.95th=[12255232], 99.99th=[71827456] bw ( KiB/s): min= 8495, max=15032, per=98.49%, avg=11324.24, stdev=1948.10, samples=17 iops : min= 2123, max= 3758, avg=2831.06, stdev=487.02, samples=17 write: IOPS=1989, BW=7957KiB/s (8148kB/s)(100MiB/12869msec); 0 zone resets clat (usec): min=64, max=83442, avg=496.97, stdev=1562.39 lat (usec): min=64, max=83442, avg=497.19, stdev=1562.39 clat percentiles (usec): | 1.00th=[ 137], 5.00th=[ 223], 10.00th=[ 269], 20.00th=[ 326], | 30.00th=[ 367], 40.00th=[ 396], 50.00th=[ 416], 60.00th=[ 445], | 70.00th=[ 474], 80.00th=[ 515], 90.00th=[ 578], 95.00th=[ 652], | 99.00th=[ 1090], 99.50th=[ 2073], 99.90th=[11863], 99.95th=[12911], | 99.99th=[79168] bw ( KiB/s): min= 6432, max=10416, per=100.00%, avg=7987.68, stdev=1022.27, samples=25 iops : min= 1608, max= 2604, avg=1996.84, stdev=255.54, samples=25 lat (nsec) : 500=0.83%, 750=11.23%, 1000=0.15% lat (usec) : 2=1.09%, 4=0.05%, 10=0.75%, 20=0.34%, 50=0.80% lat (usec) : 100=0.60%, 250=9.54%, 500=53.92%, 750=18.02%, 1000=1.57% lat (msec) : 2=0.60%, 4=0.07%, 10=0.02%, 20=0.38%, 100=0.03% cpu : usr=0.94%, sys=9.84%, ctx=84140, majf=0, minf=26 IO depths : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0% submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0% complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0% issued rwts: total=25600,25600,0,0 short=0,0,0,0 dropped=0,0,0,0 latency : target=0, window=0, percentile=100.00%, depth=1
Run status group 0 (all jobs): READ: bw=11.2MiB/s (11.8MB/s), 11.2MiB/s-11.2MiB/s (11.8MB/s-11.8MB/s), io=100MiB (105MB), run=8906-8906msec WRITE: bw=7957KiB/s (8148kB/s), 7957KiB/s-7957KiB/s (8148kB/s-8148kB/s), io=100MiB (105MB), run=12869-12869msec
=====
When I've seen hangs like this in the past, it's always boiled down to one of two things:
1) I've got some "lock" I've neglected to release whereupon a subsequent attempt to re-acquire the lock hangs...
2) I've run out of disk space in my Swift instance...
To get to the bottom of #1, just dumping the stack trace for all Go Routines is sufficient to nail the culprit. #2 is exposed with df
commands on the Swift Object Server(s) of course.
As you've probably seen, ProxyFS is composed of two or three components:
ickpt - this guy works around the "eventual consistency" of Swift Object PUT behavior in that the CheckPoint Record is PUT to Object "0000000000000000" each time. Thus, a subsequent GET (upon restarting) is not guaranteed to GET the last CheckPoint Record that was PUT. You can certainly run w/out ickpt - I do this all the time in test environments where the Eventual Consistency of Swift is very unlikely to bite us (i.e. since there are no failures of Swift nodes during the test sequence).
imgr - this is the "centralized" metadata server. Well, really, it just is the entity recording updates to the Inode Table... a simple mapping from Inode Number to an Object Number in the Swift Container. It also is responsible for tracking "leases" (distributed lock manager) and open/close references. The only Object it PUTs are the CheckPoint Record Object and the B+Tree "pages" that get modified when the Inode Table is updated.
iclient - this is the FUSE file system driver that really does all the heavy lifting of modifying the Objects making up the file system elements (i.e. directories, files, and symlinks).
BTW - cool that you enabled tracing. You can do that on the imgr side as well...kind of paints both ends of the RPC echanges between iclient & imgr that way.
I'll keep trying to reproduce your issue over the next couple of days... but any further info you might provide would be extremely helpful. One such critical piece of info would be the commit tag you are running. Here's mine (this is currently development/HEAD:
$ git describe 2.01.0-10-gb00f4d7f
I suppose you might be using the HEAD of stable:
$ git describe 2.01.0
I haven't tested that one but will endeavor to do so at my end. There have certainly been improvements in the 10 commits on the development branch. Frankly, we are nearly done getting the portion of the standard PJD test suite (https://github.com/pjd/pjdfstest) that is pertinent to the feature set of ProxyFS (i.e. only dir, file, and symlinks... no atime, no sticky/setuid/setgit, etc...). The intention was to do a 2.02.00 tagged release once we get to that point...probably this upcoming week.
I did just verify that, in my containerized environment at least, all three size= values succeed on stable/HEAD (2.01.00).
Thank you so much for your complete answer, edmc-ss. yeah, there was not an issue with proxyfs and the problem was my swift that its backend storage got full and It had a capacity of only a few megabytes.
after running fio benchmark with a 1Mb file size it works fine. but when I change the global size to ex: 10Mi it stuck. benchmark.fio
I faced the below trace log when it stuck. iclient log