geerlingguy / raspberry-pi-pcie-devices

Raspberry Pi PCI Express device compatibility database
http://pipci.jeffgeerling.com
GNU General Public License v3.0
1.6k stars 145 forks source link

Testing Samsung 980 M.2 NVMe SSD via Sintech mPCIe to M.2 Adapter #355

Closed danmanners closed 2 years ago

danmanners commented 2 years ago

I'm verifying the functionality of the Turing Pi v2 mPCIe slots for nodes 1 and two on the pre-production unit. In order to connect an NVMe drive, I required an mPCIe to M.2 adapter board.

Samsung 980 M.2 NVMe SSD

samsung980

Amazon Link: Purchase here

Sintech mPCIe to M.2 Adapter (with 20cm adapter)

sintech-mpcie-to-m2-adapter

Amazon Link: Purchase here

danmanners commented 2 years ago
ubuntu@tpiv2-node-1:~$ sudo lspci -vvvv
01:00.0 Non-Volatile memory controller: Samsung Electronics Co Ltd Device a809 (prog-if 02 [NVM Express])
        Subsystem: Samsung Electronics Co Ltd Device a801
        Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+
        Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
        Latency: 0
        Interrupt: pin A routed to IRQ 42
        Region 0: Memory at 600000000 (64-bit, non-prefetchable) [size=16K]
        Capabilities: [40] Power Management version 3
                Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-)
                Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME-
        Capabilities: [50] MSI: Enable+ Count=1/32 Maskable- 64bit+
                Address: 00000000fffffffc  Data: 6540
        Capabilities: [70] Express (v2) Endpoint, MSI 00
                DevCap: MaxPayload 256 bytes, PhantFunc 0, Latency L0s unlimited, L1 unlimited
                        ExtTag- AttnBtn- AttnInd- PwrInd- RBE+ FLReset+ SlotPowerLimit 0.000W
                DevCtl: CorrErr+ NonFatalErr+ FatalErr+ UnsupReq+
                        RlxdOrd+ ExtTag- PhantFunc- AuxPwr- NoSnoop+ FLReset-
                        MaxPayload 128 bytes, MaxReadReq 512 bytes
                DevSta: CorrErr+ NonFatalErr- FatalErr- UnsupReq- AuxPwr- TransPend-
                LnkCap: Port #0, Speed 8GT/s, Width x4, ASPM L1, Exit Latency L1 <64us
                        ClockPM+ Surprise- LLActRep- BwNot- ASPMOptComp+
                LnkCtl: ASPM Disabled; RCB 64 bytes Disabled- CommClk+
                        ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
                LnkSta: Speed 5GT/s (downgraded), Width x1 (downgraded)
                        TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
                DevCap2: Completion Timeout: Range ABCD, TimeoutDis+, NROPrPrP-, LTR+
                         10BitTagComp-, 10BitTagReq-, OBFF Not Supported, ExtFmt-, EETLPPrefix-
                         EmergencyPowerReduction Not Supported, EmergencyPowerReductionInit-
                         FRS-, TPHComp-, ExtTPHComp-
                         AtomicOpsCap: 32bit- 64bit- 128bitCAS-
                DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-, LTR+, OBFF Disabled
                         AtomicOpsCtl: ReqEn-
                LnkCtl2: Target Link Speed: 8GT/s, EnterCompliance- SpeedDis-
                         Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS-
                         Compliance De-emphasis: -6dB
                LnkSta2: Current De-emphasis Level: -3.5dB, EqualizationComplete-, EqualizationPhase1-
                         EqualizationPhase2-, EqualizationPhase3-, LinkEqualizationRequest-
        Capabilities: [b0] MSI-X: Enable- Count=13 Masked+
                Vector table: BAR=0 offset=00003000
                PBA: BAR=0 offset=00002000
        Capabilities: [100 v2] Advanced Error Reporting
                UESta:  DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
                UEMsk:  DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
                UESvrt: DLP+ SDES+ TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt- RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol-
                CESta:  RxErr- BadTLP- BadDLLP- Rollover- Timeout- AdvNonFatalErr-
                CEMsk:  RxErr- BadTLP- BadDLLP- Rollover- Timeout- AdvNonFatalErr+
                AERCap: First Error Pointer: 00, ECRCGenCap+ ECRCGenEn- ECRCChkCap+ ECRCChkEn-
                        MultHdrRecCap+ MultHdrRecEn- TLPPfxPres- HdrLogCap-
                HeaderLog: 00000000 00000000 00000000 00000000
        Capabilities: [148 v1] Device Serial Number 00-00-00-00-00-00-00-00
        Capabilities: [158 v1] Power Budgeting <?>
        Capabilities: [168 v1] Secondary PCI Express
                LnkCtl3: LnkEquIntrruptEn-, PerformEqu-
                LaneErrStat: 0
        Capabilities: [188 v1] Latency Tolerance Reporting
                Max snoop latency: 0ns
                Max no snoop latency: 0ns
        Capabilities: [190 v1] L1 PM Substates
                L1SubCap: PCI-PM_L1.2+ PCI-PM_L1.1+ ASPM_L1.2+ ASPM_L1.1+ L1_PM_Substates+
                          PortCommonModeRestoreTime=10us PortTPowerOnTime=10us
                L1SubCtl1: PCI-PM_L1.2- PCI-PM_L1.1- ASPM_L1.2- ASPM_L1.1-
                           T_CommonMode=0us LTR1.2_Threshold=0ns
                L1SubCtl2: T_PwrOn=10us
        Kernel driver in use: nvme
        Kernel modules: nvme
danmanners commented 2 years ago

Running Jeff Geerling's Disk Benchmark:

ubuntu@tpiv2-node-1:~$ wget -q https://raw.githubusercontent.com/geerlingguy/raspberry-pi-dramble/master/setup/benchmarks/disk-benchmark.sh
ubuntu@tpiv2-node-1:~$ chmod +x disk-benchmark.sh
ubuntu@tpiv2-node-1:~$ sudo DEVICE_UNDER_TEST=/dev/nvme0n1p1 DEVICE_MOUNT_PATH=/mnt/nvme ./disk-benchmark.sh

Results:

Benchmark Result
fio 1M sequential read 416 MB/s
iozone 1M random read 210.97 MB/s
iozone 1M random write 188.70 MB/s
iozone 4K random read 14.77 MB/s
iozone 4K random write 25.38 MB/s
Raw Output

```bash Running fio sequential read test... fio-rand-read-sequential: (g=0): rw=read, bs=(R) 1024KiB-1024KiB, (W) 1024KiB-1024KiB, (T) 1024KiB-1024KiB, ioengine=libaio, iodepth=64 ... fio-3.16 Starting 4 processes Jobs: 4 (f=4): [R(4)][36.4%][r=395MiB/s][r=394 IOPS][eta 00m:07s] Jobs: 4 (f=4): [R(4)][50.0%][r=395MiB/s][r=395 IOPS][eta 00m:05s] Jobs: 4 (f=4): [R(4)][63.6%][r=402MiB/s][r=401 IOPS][eta 00m:04s] Jobs: 4 (f=4): [R(4)][80.0%][r=396MiB/s][r=396 IOPS][eta 00m:02s] Jobs: 4 (f=4): [R(4)][100.0%][r=399MiB/s][r=399 IOPS][eta 00m:00s] fio-rand-read-sequential: (groupid=0, jobs=4): err= 0: pid=8843: Sat Jan 8 17:06:40 2022 read: IOPS=397, BW=397MiB/s (416MB/s)(4039MiB/10172msec) slat (usec): min=148, max=50238, avg=9887.74, stdev=11251.53 clat (msec): min=150, max=831, avg=621.12, stdev=74.51 lat (msec): min=170, max=843, avg=631.01, stdev=74.80 clat percentiles (msec): | 1.00th=[ 262], 5.00th=[ 542], 10.00th=[ 567], 20.00th=[ 592], | 30.00th=[ 600], 40.00th=[ 617], 50.00th=[ 625], 60.00th=[ 642], | 70.00th=[ 651], 80.00th=[ 667], 90.00th=[ 693], 95.00th=[ 709], | 99.00th=[ 743], 99.50th=[ 760], 99.90th=[ 810], 99.95th=[ 835], | 99.99th=[ 835] bw ( KiB/s): min=286720, max=462619, per=98.94%, avg=402280.67, stdev=11273.37, samples=77 iops : min= 280, max= 451, avg=392.52, stdev=11.00, samples=77 lat (msec) : 250=0.94%, 500=2.87%, 750=95.54%, 1000=0.64% cpu : usr=0.09%, sys=3.88%, ctx=4939, majf=0, minf=65623 IO depths : 1=0.1%, 2=0.2%, 4=0.4%, 8=0.8%, 16=1.6%, 32=3.2%, >=64=93.8% submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0% complete : 0=0.0%, 4=99.9%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.1%, >=64=0.0% issued rwts: total=4039,0,0,0 short=0,0,0,0 dropped=0,0,0,0 latency : target=0, window=0, percentile=100.00%, depth=64 Run status group 0 (all jobs): READ: bw=397MiB/s (416MB/s), 397MiB/s-397MiB/s (416MB/s-416MB/s), io=4039MiB (4235MB), run=10172-10172msec Disk stats (read/write): nvme0n1: ios=16044/4, merge=0/0, ticks=2527898/31, in_queue=2495852, util=99.01% Running iozone 1024K random read and write tests... Iozone: Performance Test of File I/O Version $Revision: 3.492 $ Compiled for 64 bit mode. Build: linux-arm Contributors:William Norcott, Don Capps, Isom Crawford, Kirby Collins Al Slater, Scott Rhine, Mike Wisner, Ken Goss Steve Landherr, Brad Smith, Mark Kelly, Dr. Alain CYR, Randy Dunlap, Mark Montague, Dan Million, Gavin Brebner, Jean-Marc Zucconi, Jeff Blomberg, Benny Halevy, Dave Boone, Erik Habbinga, Kris Strecker, Walter Wong, Joshua Root, Fabrice Bacchella, Zhenghua Xue, Qin Li, Darren Sawyer, Vangel Bojaxhi, Ben England, Vikentsi Lapa, Alexey Skidanov, Sudhir Kumar. Run began: Sat Jan 8 17:06:40 2022 Include fsync in write timing O_DIRECT feature enabled Auto Mode File size set to 102400 kB Record Size 1024 kB Command line used: ./iozone -e -I -a -s 100M -r 1024k -i 0 -i 2 -f /mnt/nvme/iozone Output is in kBytes/sec Time Resolution = 0.000001 seconds. Processor cache size set to 1024 kBytes. Processor cache line size set to 32 bytes. File stride size set to 17 * record size. random random bkwd record stride kB reclen write rewrite read reread read write read rewrite read fwrite frewrite fread freread 102400 1024 217107 219792 216043 193229 iozone test complete. Running iozone 4K random read and write tests... Iozone: Performance Test of File I/O Version $Revision: 3.492 $ Compiled for 64 bit mode. Build: linux-arm Contributors:William Norcott, Don Capps, Isom Crawford, Kirby Collins Al Slater, Scott Rhine, Mike Wisner, Ken Goss Steve Landherr, Brad Smith, Mark Kelly, Dr. Alain CYR, Randy Dunlap, Mark Montague, Dan Million, Gavin Brebner, Jean-Marc Zucconi, Jeff Blomberg, Benny Halevy, Dave Boone, Erik Habbinga, Kris Strecker, Walter Wong, Joshua Root, Fabrice Bacchella, Zhenghua Xue, Qin Li, Darren Sawyer, Vangel Bojaxhi, Ben England, Vikentsi Lapa, Alexey Skidanov, Sudhir Kumar. Run began: Sat Jan 8 17:07:11 2022 Include fsync in write timing O_DIRECT feature enabled Auto Mode File size set to 102400 kB Record Size 4 kB Command line used: ./iozone -e -I -a -s 100M -r 4k -i 0 -i 2 -f /mnt/nvme/iozone Output is in kBytes/sec Time Resolution = 0.000001 seconds. Processor cache size set to 1024 kBytes. Processor cache line size set to 32 bytes. File stride size set to 17 * record size. random random bkwd record stride kB reclen write rewrite read reread read write read rewrite read fwrite frewrite fread freread 102400 4 20223 29942 15131 25991 iozone test complete. Disk benchmark complete! ```

danmanners commented 2 years ago

This is all validated on Ubuntu 20.04.3 LTS with the Compute Module 4 with 4GiB of memory.

ubuntu@tpiv2-node-1:~$ cat /etc/os-release
NAME="Ubuntu"
VERSION="20.04.3 LTS (Focal Fossa)"
ID=ubuntu
ID_LIKE=debian
PRETTY_NAME="Ubuntu 20.04.3 LTS"
VERSION_ID="20.04"
HOME_URL="https://www.ubuntu.com/"
SUPPORT_URL="https://help.ubuntu.com/"
BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/"
PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy"
VERSION_CODENAME=focal
UBUNTU_CODENAME=focal

ubuntu@tpiv2-node-1:~$ uname -a
Linux tpiv2-node-1 5.4.0-1048-raspi #53-Ubuntu SMP PREEMPT Wed Dec 8 13:06:23 UTC 2021 aarch64 aarch64 aarch64 GNU/Linux
geerlingguy commented 2 years ago

Interesting seeing the 1M block size random IO being quite a bit slower than other high-end NVMe drives like Kioxia's XG6.

geerlingguy commented 2 years ago

Also, thanks so much for submitting the info for this drive and card. I know a few people have asked about the Samsung 980 (I think I've only ever tried the 970), so it's good to know it works at least!

geerlingguy commented 2 years ago

Closing as the pages are up in the database. Feel free to post any more info to the issue though!

danmanners commented 2 years ago

Interesting seeing the 1M block size random IO being quite a bit slower than other high-end NVMe drives like Kioxia's XG6.

Definitely curious about what's going on there. May try the same thing with RasPi OS 32 and 64-bit and see if it's any different/better.