earlephilhower / ezfio

Simple NVME/SAS/SATA SSD test framework for Linux and Windows
GNU General Public License v2.0
166 stars 52 forks source link

show as zero iops. #32

Closed linuxbest closed 5 years ago

linuxbest commented 5 years ago

(https://github.com/earlephilhower/ezfio/files/2908590/ezfio_results_28GB_4cores_3900MHz_nvme0n1_nvmeof_2019-02-26_17-31-20.ods.gz)

Why long term performance stability show as 0 iops?

Thanks.


ezFio test parameters:

           Drive: /dev/nvme0n1
           Model: SAMSUNG MZPLL1T6HEHP-00003
          Serial: S3HBNA0K303021
   AvailCapacity: 26 GiB
  TestedCapacity: 13 GiB
             CPU: Intel Core i3-7100 CPU @ 3.90GHz
           Cores: 4
       Frequency: 3900
     FIO Version: fio-3.8

Test Description BW(MB/s) IOPS Lat(us)


---Sequential Preconditioning---
Sequential Preconditioning Pass 1 DONE DONE DONE Sequential Preconditioning Pass 2 DONE DONE DONE

---Sustained Multi-Threaded Sequential Read Tests by Block Size---
Sustained Multi-Threaded Sequential Read Tests by Block Size, BS=512 0.00 0 0.0 Sustained Multi-Threaded Sequential Read Tests by Block Size, BS=1024 0.00 0 0.0 Sustained Multi-Threaded Sequential Read Tests by Block Size, BS=2048 0.00 0 0.0 Sustained Multi-Threaded Sequential Read Tests by Block Size, BS=4096 1,049.27 268,613 952.7 Sustained Multi-Threaded Sequential Read Tests by Block Size, BS=8192 2,080.76 266,337 960.9 Sustained Multi-Threaded Sequential Read Tests by Block Size, BS=16384 3,757.09 240,453 1064.3 Sustained Multi-Threaded Sequential Read Tests by Block Size, BS=32768 5,852.74 187,288 1366.6 Sustained Multi-Threaded Sequential Read Tests by Block Size, BS=65536 5,845.64 93,530 2736.8 Sustained Multi-Threaded Sequential Read Tests by Block Size, BS=131072 5,835.88 46,687 5482.7

---Sustained Multi-Threaded Random Read Tests by Block Size---
Sustained Multi-Threaded Random Read Tests by Block Size, BS=512 0.00 0 0.0 Sustained Multi-Threaded Random Read Tests by Block Size, BS=1024 0.00 0 0.0 Sustained Multi-Threaded Random Read Tests by Block Size, BS=2048 0.00 0 0.0 Sustained Multi-Threaded Random Read Tests by Block Size, BS=4096 2,932.19 750,640 338.7 Sustained Multi-Threaded Random Read Tests by Block Size, BS=8192 5,331.03 682,372 373.1 Sustained Multi-Threaded Random Read Tests by Block Size, BS=16384 3,697.88 236,664 1080.0 Sustained Multi-Threaded Random Read Tests by Block Size, BS=32768 5,493.61 175,796 1455.6 Sustained Multi-Threaded Random Read Tests by Block Size, BS=65536 5,455.25 87,284 2931.9 Sustained Multi-Threaded Random Read Tests by Block Size, BS=131072 5,574.52 44,596 5739.5

---Sequential Write Tests with Queue Depth=1 by Block Size---
Sequential Write Tests with Queue Depth=1 by Block Size, BS=512 0.00 0 0.0 Sequential Write Tests with Queue Depth=1 by Block Size, BS=1024 0.00 0 0.0 Sequential Write Tests with Queue Depth=1 by Block Size, BS=2048 0.00 0 0.0 Sequential Write Tests with Queue Depth=1 by Block Size, BS=4096 237.95 60,915 16.0 Sequential Write Tests with Queue Depth=1 by Block Size, BS=8192 424.28 54,307 17.9 Sequential Write Tests with Queue Depth=1 by Block Size, BS=16384 684.19 43,788 22.3 Sequential Write Tests with Queue Depth=1 by Block Size, BS=32768 1,021.94 32,702 29.9 Sequential Write Tests with Queue Depth=1 by Block Size, BS=65536 1,416.31 22,661 43.1 Sequential Write Tests with Queue Depth=1 by Block Size, BS=131072 1,678.60 13,429 73.2

---Random Preconditioning---
Random Preconditioning DONE DONE DONE Random Preconditioning DONE DONE DONE

---Sustained 4KB Random Read Tests by Number of Threads---
Sustained 4KB Random Read Tests by Number of Threads, Threads=1 43.63 11,169 88.4 Sustained 4KB Random Read Tests by Number of Threads, Threads=2 90.98 23,290 85.0 Sustained 4KB Random Read Tests by Number of Threads, Threads=4 169.53 43,398 90.9 Sustained 4KB Random Read Tests by Number of Threads, Threads=8 381.15 97,575 81.5 Sustained 4KB Random Read Tests by Number of Threads, Threads=16 730.29 186,955 85.1 Sustained 4KB Random Read Tests by Number of Threads, Threads=32 1,317.28 337,225 94.3 Sustained 4KB Random Read Tests by Number of Threads, Threads=64 2,010.83 514,773 122.5 Sustained 4KB Random Read Tests by Number of Threads, Threads=128 1,992.69 510,128 241.8 Sustained 4KB Random Read Tests by Number of Threads, Threads=256 1,553.91 397,800 607.0

---Sustained 4KB Random mixed 30% Write Tests by Threads---
Sustained 4KB Random mixed 30% Write Tests by Threads, Threads=1 60.32 15,443 85.1 Sustained 4KB Random mixed 30% Write Tests by Threads, Threads=2 112.98 28,923 90.3 Sustained 4KB Random mixed 30% Write Tests by Threads, Threads=4 173.27 44,358 112.6 Sustained 4KB Random mixed 30% Write Tests by Threads, Threads=8 298.74 76,478 129.5 Sustained 4KB Random mixed 30% Write Tests by Threads, Threads=16 536.49 137,342 147.2 Sustained 4KB Random mixed 30% Write Tests by Threads, Threads=32 825.57 211,345 192.3 Sustained 4KB Random mixed 30% Write Tests by Threads, Threads=64 1,143.85 292,825 283.4 Sustained 4KB Random mixed 30% Write Tests by Threads, Threads=128 1,458.41 373,354 447.0 Sustained 4KB Random mixed 30% Write Tests by Threads, Threads=256 1,496.10 383,002 781.4

---Sustained Perf Stability Test - 4KB Random 30% Write---
Sustained Perf Stability Test - 4KB Random 30% Write 1,490.16 381,480 783.5

---Sustained 4KB Random Write Tests by Number of Threads---
Sustained 4KB Random Write Tests by Number of Threads, Threads=1 231.21 59,191 16.4 Sustained 4KB Random Write Tests by Number of Threads, Threads=2 521.60 133,529 14.6 Sustained 4KB Random Write Tests by Number of Threads, Threads=4 897.79 229,833 17.1 Sustained 4KB Random Write Tests by Number of Threads, Threads=8 1,292.87 330,975 23.7 Sustained 4KB Random Write Tests by Number of Threads, Threads=16 1,602.42 410,219 38.5 Sustained 4KB Random Write Tests by Number of Threads, Threads=32 1,766.20 452,147 70.3 Sustained 4KB Random Write Tests by Number of Threads, Threads=64 1,730.65 443,047 143.8 Sustained 4KB Random Write Tests by Number of Threads, Threads=128 1,757.94 450,033 283.5 Sustained 4KB Random Write Tests by Number of Threads, Threads=256 1,574.54 403,082 633.4

---Sustained Multi-Threaded Random Write Tests by Block Size---
Sustained Multi-Threaded Random Write Tests by Block Size, BS=512 0.00 0 0.0 Sustained Multi-Threaded Random Write Tests by Block Size, BS=1024 0.00 0 0.0 Sustained Multi-Threaded Random Write Tests by Block Size, BS=2048 0.00 0 0.0 Sustained Multi-Threaded Random Write Tests by Block Size, BS=4096 1,877.82 480,723 531.8 Sustained Multi-Threaded Random Write Tests by Block Size, BS=8192 2,092.03 267,780 954.9 Sustained Multi-Threaded Random Write Tests by Block Size, BS=16384 2,084.45 133,405 1916.7 Sustained Multi-Threaded Random Write Tests by Block Size, BS=32768 2,094.95 67,039 3816.3 Sustained Multi-Threaded Random Write Tests by Block Size, BS=65536 2,089.59 33,433 7653.8 Sustained Multi-Threaded Random Write Tests by Block Size, BS=131072 2,098.95 16,792 15241.2

COMPLETED! Spreadsheet file: /home/shu/src/ezfio/ezfio_results_28GB_4cores_3900MHz_nvme0n1_nvmeof_2019-02-26_17-31-20.ods

earlephilhower commented 5 years ago

The code looks for the IOPS using a /sys file. If that file can't be opened or is in a different than expected format, you can end up with 0s being read out.

statpath = "/sys/block/"+physDriveBase+"/stat"

I'd watch the contents of /sys/block/nvme0n1/stat during a run and see how it looks for starters. I've seen runs fine with kernels around the one you've got listed, but not specifically w/the Ubuntu releases.

Also, just so you're aware, running only 13GB on a multi-TB SSD is going to give you only peak and not sustained results. It's faster, but less accurate a reflection of what'll happen after the card's in service a while and full of data. For debugging stuff like this, though, it's fine.

linuxbest commented 5 years ago

Thanks, I suppose the IOPS is reported by fio.

I checked my system, It's wired, all the traffic count into nvme0c33n1, Maybe I did something wrong when create the name space or name space attach procedure.

Thanks.

lrwxrwxrwx 1 root root 0 Feb 27 09:51 nvme0c33n1 -> ../devices/pci0000:00/0000:00:01.0/0000:01:00.0/nvme/nvme0/nvme0c33n1
lrwxrwxrwx 1 root root 0 Feb 27 09:51 nvme0n1 -> ../devices/virtual/nvme-subsystem/nvme-subsys0/nvme0n1
root@nvmeof:/sys/block# cat nvme0n1/stat
       0        0        0        0        0        0        0        0        0        0        0
root@nvmeof:/sys/block# cat nvme0c33n1/stat
      46        0     2096        0        0        0        0        0        0        0        0
root@nvmeof:/sys/block# fdisk -l /dev/nvme0n1 
Disk /dev/nvme0n1: 26.7 GiB, 28672000000 bytes, 7000000 sectors
Units: sectors of 1 * 4096 = 4096 bytes
Sector size (logical/physical): 4096 bytes / 4096 bytes
I/O size (minimum/optimal): 4096 bytes / 4096 bytes
root@nvmeof:/sys/block# cat nvme0c33n1/stat
      48        0     2112        0        0        0        0        0        0        0        0
earlephilhower commented 5 years ago

Hmmm, I've never seen that kind of naming convention. There should be a 1:1 mapping of /dev/xxxx and those files. Is there any other SW running on top of the drive (i.e. a caching driver, LVM, etc.)?

Does /dev/nvme0c323n1 exist? If so, you could try running against that.

FIO will report IOPS/etc. since it measure things using its own internal timers and code. EZFio just parses FIO output and puts it into a spreadsheet for all tests except for the 20 minute IO stability test (because this test is really trying to see if there are periods of exceptionally low or high performance that "average out" on a long test but which hurt over small timeframes).

linuxbest commented 5 years ago
root@nvmeof:/sys/block# lspci |grep Non
01:00.0 Non-Volatile memory controller: Samsung Electronics Co Ltd NVMe SSD Controller 172Xa/172Xb (rev 01)
root@nvmeof:/sys/block# grep nvme /proc/partitions 
 259        7   28000000 nvme0n1
root@nvmeof:/sys/block# ls -la /dev/nvme*
crw------- 1 root root 241, 0 Feb 27 09:51 /dev/nvme0
brw-rw---- 1 root disk 259, 7 Feb 27 09:51 /dev/nvme0n1

I checked the kernel code, It seems releated with mutipath support:

 /*
  * If multipathing is enabled we need to always use the subsystem instance
  * number for numbering our devices to avoid conflicts between subsystems that
  * have multiple controllers and thus use the multipath-aware subsystem node
  * and those that have a single controller and use the controller node
  * directly.
  */
 void nvme_set_disk_name(char *disk_name, struct nvme_ns *ns,
                         struct nvme_ctrl *ctrl, int *flags)
 {
         if (!multipath) {
                 sprintf(disk_name, "nvme%dn%d", ctrl->instance, ns->head->instance);
         } else if (ns->head->disk) {
                 sprintf(disk_name, "nvme%dc%dn%d", ctrl->subsys->instance,
                                 ctrl->cntlid, ns->head->instance);
                 *flags = GENHD_FL_HIDDEN;
         } else {
                 sprintf(disk_name, "nvme%dn%d", ctrl->subsys->instance,
                                 ns->head->instance);
         }
 }
earlephilhower commented 5 years ago

Strange multipath would get involved with a local NVMe device and naming, but guess there's some reason for it.

Are you aware of any mapping from /dev/blah-naming to the actual /sys/block/blah? Tools like "iostat" will have the same issue (i.e. iostat /dev/nvme0n1 reporting all 0s).

I suppose it is technically possible to check for `/dev/nvme\d+c\d+n\d+, where the first and last digit match. But that would be horribly ugly and I'd hate to try and verify it.

earlephilhower commented 5 years ago

Actually, if this is part of a multipath configuration, have you run against the actual mpath devnode? /dev/mapper/...?

earlephilhower commented 5 years ago

Check PR #34, and use --cluster --drive localhost:/dev/nvmedevice, @linuxbest . The new cluster mode needs to use FIO's built-in instantaneous IOPS measurements which are independent of any iostat/proc filesystem weirdness. It should clear up the 0 reports on multipaths.

earlephilhower commented 5 years ago

Closing as even the default build now uses FIO reported IOPS and no Linux counters for the graph production.