Closed JosephNew closed 4 years ago
File "ezfio.py", line 975, in <lambda>
o['runtime'])})
File "ezfio.py", line 894, in RunTest
syscpu = float(client['sys_cpu'])
UnboundLocalError: local variable 'client' referenced before assignment
Sustained Multi-Threaded Sequential Read Tests by Block Size, BS=512 ERROR ERROR ERROR ERROR DETECTED, ABORTING TEST RUN.
I'm sorry, but I don't quite understand your issue.
I changed ezfio for my last employer and we were extensively using the cluster mode (because it was a NVMEoF array company and we were testing perf of 10s of clients) without incident. Make sure you have started the fio server on the remotes or it won't connect (and it can't start them by itself).
Also, your last message is saying your SSDs aren't supporting 512b accesses. The script checks the Linux /sys filesystem to get the reported min IO size, and if the test is smaller than it, will skip. So I'd check that you're reporting the proper min IO size and not the default 512b on your product.
python ezfio.py --cluster -d node31:/dev/nvme2n1,node32:/dev/nvme3n1,node33:/dev/nvme2n1
python ezfio.py -d node31:/dev/nvme2n1,dev/nvme3n1,dev/nvme4n1
python ezfio.py --cluster -d node31:/dev/nvme2n1,/dev/nvme3n1,node32:/dev/nvme2n1,/dev/nvme3n1,node33:/dev/nvme2n1,/dev/nvme3n1
[root@node31 ezfio]# python3 ezfio.py --cluster --drive node31:/dev/nvme1n1,node32:/dev/nvme1n1,node33:/dev/nvme0n1
---------------------------------------------------------------------------
WARNING! WARNING! WARNING! WARNING! WARNING! WARNING! WARNING! WARNING! WARNING!
THIS TEST WILL DESTROY ANY DATA AND FILESYSTEMS ON /dev/nvme1n1
Please type the word "yes" and hit return to continue, or anything else to abort.yes
---------------------------------------------------------------------------
ezFio test parameters:
Drive: node31:/dev/nvme1n1,node32:/dev/nvme1n1,node33:/dev/nvme0n1
Model: SUZAKU
Serial: DIR0103000
AvailCapacity: 1024 GiB
TestedCapacity: 1024 GiB
TestedOffset: 0 GiB
CPU: Intel Xeon CPU E5-2650 v2 @ 2.60GHz
Cores: 16
Frequency: 2600
FIO Version: fio-3.20-38-g14060-dirty
Test Description BW(MB/s) IOPS Lat(us)
---Sequential Preconditioning---
Sequential Preconditioning Pass 1 DONE DONE DONE
Sequential Preconditioning Pass 2 DONE DONE DONE
---Sustained Multi-Threaded Sequential Read Tests by Block Size---
Sustained Multi-Threaded Sequential Read Tests by Block Size, BS=512 72.40 148,268 5153.2
Sustained Multi-Threaded Sequential Read Tests by Block Size, BS=1024 200.06 204,860 3746.7
Sustained Multi-Threaded Sequential Read Tests by Block Size, BS=2048 558.85 286,130 2680.1
Sustained Multi-Threaded Sequential Read Tests by Block Size, BS=4096 1,502.53 384,647 1922.3
Sustained Multi-Threaded Sequential Read Tests by Block Size, BS=8192 3,107.85 397,805 1902.8
Sustained Multi-Threaded Sequential Read Tests by Block Size, BS=16384 5,882.61 376,487 1885.9
Sustained Multi-Threaded Sequential Read Tests by Block Size, BS=32768 6,203.94 198,526 3868.6
Sustained Multi-Threaded Sequential Read Tests by Block Size, BS=65536 6,229.46 99,671 6438.2
Sustained Multi-Threaded Sequential Read Tests by Block Size, BS=131072 6,246.78 49,974 12840.8
---Sustained Multi-Threaded Random Read Tests by Block Size---
Sustained Multi-Threaded Random Read Tests by Block Size, BS=512 950.00 1,945,607 433.4
Sustained Multi-Threaded Random Read Tests by Block Size, BS=1024 2,032.06 2,080,834 371.0
Sustained Multi-Threaded Random Read Tests by Block Size, BS=2048 4,029.06 2,062,880 373.4
Sustained Multi-Threaded Random Read Tests by Block Size, BS=4096 5,981.07 1,531,155 498.4
Sustained Multi-Threaded Random Read Tests by Block Size, BS=8192 6,109.33 781,994 991.7
Sustained Multi-Threaded Random Read Tests by Block Size, BS=16384 6,190.74 396,208 1941.7
Sustained Multi-Threaded Random Read Tests by Block Size, BS=32768 6,226.39 199,245 3863.3
Sustained Multi-Threaded Random Read Tests by Block Size, BS=65536 00:01:18
- I'll do multi-disk and multi-node test later,and update the result
"disk_util" : [
{
"name" : "nvme1n1",
"read_ios" : 7764453,
"write_ios" : 0,
"read_merges" : 0,
"write_merges" : 0,
"read_ticks" : 873398,
"write_ticks" : 0,
"in_queue" : 916872,
"util" : 100.000000,
"hostname" : "node32",
"port" : 8765
},
{
"name" : "nvme0n1",
"read_ios" : 7861419,
"write_ios" : 0,
"read_merges" : 0,
"write_merges" : 0,
"read_ticks" : 871531,
"write_ticks" : 0,
"in_queue" : 884591,
"util" : 100.000000,
"hostname" : "node33",
"port" : 8765
}
]
}
STDERR:
Exception in thread Thread-35:
Traceback (most recent call last):
File "/usr/lib64/python3.6/threading.py", line 916, in _bootstrap_inner
self.run()
File "/usr/lib64/python3.6/threading.py", line 864, in run
self._target(*self._args, **self._kwargs)
File "ezfio.py", line 1120, in JobWrapper
val = o'cmdline'
File "ezfio.py", line 974, in
Sustained 4KB Random Read Tests by Number of Threads, Threads=8 ERROR ERROR ERROR ERROR DETECTED, ABORTING TEST RUN.
FIO crashed for some reason. If the drive dropped offline during the run, or there was a network/HW error, it can do that. Check the FIO logs to get the exact message from FIO.
FIO crashed for some reason. If the drive dropped offline during the run, or there was a network/HW error, it can do that. Check the FIO logs to get the exact message from FIO.
python3 ezfio.py --cluster -d node31:/dev/nvme2n1,/dev/nvme3n1,node32:/dev/nvme2n1,/dev/nvme3n1,node33:/dev/nvme2n1,/dev/nvme3n1
Traceback (most recent call last):
File "ezfio.py", line 1446, in <module>
ParseArgs()
File "ezfio.py", line 216, in ParseArgs
physDriveDict[node.split(":")[0]] = node.split(":")[1]
IndexError: list index out of range
You have mail in /var/spool/mail/root
python3 ezfio.py --cluster -d node31:/dev/nvme2n1,/dev/nvme3n1,node32:/dev/nvme2n1,/dev/nvme3n1,node33:/dev/nvme2n1,/dev/nvme3n1
You need nodenames before each devnode. node31:/dev/a,node31:/dev/b,node32:/dev/a,node32:/dev/b...
python3 ezfio.py --cluster -d node31:/dev/nvme2n1,/dev/nvme3n1,node32:/dev/nvme2n1,/dev/nvme3n1,node33:/dev/nvme2n1,/dev/nvme3n1
You need nodenames before each devnode.
node31:/dev/a,node31:/dev/b,node32:/dev/a,node32:/dev/b...
python3 ezfio.py --cluster -d node31:/dev/nvme2n1,node31:/dev/nvme3n1,node32:/dev/nvme2n1
python3 ezfio.py --cluster -d node31:/dev/nvme2n1,node31:/dev/nvme3n1,node31:/dev/nvme4n1,node32:/dev/nvme2n1
iostat -x 1
can watch the realtime I/O per seconds
avg-cpu: %user %nice %system %iowait %steal %idle
3.60 0.00 4.80 0.00 0.00 91.60
Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await r_await w_await svctm %util sda 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 sdc 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 sdb 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 sdd 0.00 0.00 0.00 1.00 0.00 4.00 8.00 0.00 0.00 0.00 0.00 0.00 0.00 nvme0n1 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 dm-0 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 dm-1 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 nvme1n1 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 nvme2n1 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 nvme3n1 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 nvme4n1 0.00 0.00 0.00 22259.00 0.00 2849152.00 256.00 62.11 2.79 0.00 2.79 0.04 100.00 nvme5n1 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
- For another compare.``` python3 ezfio.py --drive /dev/nvme1n1,/dev/nvme2n1```can push I/O in all the disk
- But only in one node
```bash
avg-cpu: %user %nice %system %iowait %steal %idle
3.76 0.00 4.71 0.00 0.00 91.53
Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await r_await w_await svctm %util
sda 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
sdc 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
sdb 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
sdd 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
nvme0n1 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
dm-0 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
dm-1 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
nvme1n1 0.00 0.00 0.00 11552.00 0.00 1478656.00 256.00 63.54 5.49 0.00 5.49 0.09 100.00
nvme2n1 0.00 0.00 0.00 11652.00 0.00 1491456.00 256.00 63.25 5.43 0.00 5.43 0.09 100.00
nvme3n1 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
nvme4n1 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
nvme5n1 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
nvme6n1 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
nvme7n1 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
nvme8n1 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
nvme9n1 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
nvme10n1 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
nvme11n1 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
nvme12n1 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
FIO crashed for some reason. If the drive dropped offline during the run, or there was a network/HW error, it can do that. Check the FIO logs to get the exact message from FIO.
diff --git a/filelock.c b/filelock.c
index 7e92f63..9cb2c4d 100644
--- a/filelock.c
+++ b/filelock.c
@@ -22,7 +22,7 @@ struct fio_filelock {
unsigned int references;
};
-#define MAX_FILELOCKS 1024 +#define MAX_FILELOCKS 8192
static struct filelock_data { struct flist_head list; diff --git a/fio.h b/fio.h index 8045c32..4ad19ba 100644 --- a/fio.h +++ b/fio.h @@ -556,7 +556,7 @@ static inline void fio_ro_check(const struct thre !(io_u->ddir == DDIR_TRIM && !td_trim(td))); }
-#define REAL_MAX_JOBS 4096 +#define REAL_MAX_JOBS 8192
static inline bool should_fsync(struct thread_data td) { diff --git a/os/os.h b/os/os.h index 9a280e5..e31b30c 100644 --- a/os/os.h +++ b/os/os.h @@ -173,7 +173,7 @@ extern int fio_cpus_split(os_cpu_mask_t mask, un
-#define FIO_MAX_JOBS 4096 +#define FIO_MAX_JOBS 8192
Hey, that's a very good debug! Maybe you can make a Pull Request to the FIO repository with the change? I also found an issue (not a bug, just something too small for large storage systems) and submitted a PR that was quickly accepted by the author.