geerlingguy / raspberry-pi-pcie-devices

Raspberry Pi PCI Express device compatibility database
http://pipci.jeffgeerling.com
GNU General Public License v3.0
1.6k stars 145 forks source link

Test KIOXIA BG4 M.2 NVMe SSD #326

Closed geerlingguy closed 2 years ago

geerlingguy commented 2 years ago

I received a KIOXIA BG4 M.2 NVMe SSD drive for testing with the Pi.

DSC04801

I should note this drive is typically used as an OEM part, and is not commonly found in retail sales channels, like Amazon here in the US. But you can find them used pretty easily.

geerlingguy commented 2 years ago
$ sudo lspci -vvvv
...
15:00.0 Non-Volatile memory controller: KIOXIA Corporation Device 0001 (prog-if 02 [NVM Express])
    Subsystem: KIOXIA Corporation Device 0001
    Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+
    Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
    Latency: 0
    Interrupt: pin A routed to IRQ 0
    Region 0: Memory at 600e00000 (64-bit, non-prefetchable) [size=16K]
    Capabilities: [40] Express (v2) Endpoint, MSI 00
        DevCap: MaxPayload 256 bytes, PhantFunc 0, Latency L0s unlimited, L1 unlimited
            ExtTag- AttnBtn- AttnInd- PwrInd- RBE+ FLReset+ SlotPowerLimit 25.000W
        DevCtl: CorrErr- NonFatalErr- FatalErr- UnsupReq-
            RlxdOrd- ExtTag- PhantFunc- AuxPwr- NoSnoop- FLReset-
            MaxPayload 128 bytes, MaxReadReq 512 bytes
        DevSta: CorrErr- NonFatalErr- FatalErr- UnsupReq- AuxPwr- TransPend-
        LnkCap: Port #0, Speed 8GT/s, Width x4, ASPM L0s L1, Exit Latency L0s <4us, L1 <32us
            ClockPM- Surprise- LLActRep- BwNot- ASPMOptComp+
        LnkCtl: ASPM Disabled; RCB 64 bytes, Disabled- CommClk-
            ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
        LnkSta: Speed 2.5GT/s (downgraded), Width x1 (downgraded)
            TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
        DevCap2: Completion Timeout: Range AB, TimeoutDis+ NROPrPrP- LTR+
             10BitTagComp- 10BitTagReq- OBFF Not Supported, ExtFmt+ EETLPPrefix-
             EmergencyPowerReduction Not Supported, EmergencyPowerReductionInit-
             FRS- TPHComp- ExtTPHComp-
             AtomicOpsCap: 32bit- 64bit- 128bitCAS-
        DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis- LTR- OBFF Disabled,
             AtomicOpsCtl: ReqEn-
        LnkCap2: Supported Link Speeds: 2.5-8GT/s, Crosslink- Retimer- 2Retimers- DRS-
        LnkCtl2: Target Link Speed: 8GT/s, EnterCompliance- SpeedDis-
             Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS-
             Compliance De-emphasis: -6dB
        LnkSta2: Current De-emphasis Level: -6dB, EqualizationComplete- EqualizationPhase1-
             EqualizationPhase2- EqualizationPhase3- LinkEqualizationRequest-
             Retimer- 2Retimers- CrosslinkRes: unsupported
    Capabilities: [80] Power Management version 3
        Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0+,D1-,D2-,D3hot+,D3cold-)
        Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME-
    Capabilities: [90] MSI: Enable- Count=1/32 Maskable+ 64bit+
        Address: 0000000000000000  Data: 0000
        Masking: 00000000  Pending: 00000000
    Capabilities: [b0] MSI-X: Enable+ Count=32 Masked-
        Vector table: BAR=0 offset=00002000
        PBA: BAR=0 offset=00003000
    Capabilities: [100 v2] Advanced Error Reporting
        UESta:  DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
        UEMsk:  DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
        UESvrt: DLP+ SDES- TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt- RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol-
        CESta:  RxErr- BadTLP- BadDLLP- Rollover- Timeout- AdvNonFatalErr-
        CEMsk:  RxErr- BadTLP- BadDLLP- Rollover- Timeout- AdvNonFatalErr+
        AERCap: First Error Pointer: 00, ECRCGenCap- ECRCGenEn- ECRCChkCap- ECRCChkEn-
            MultHdrRecCap- MultHdrRecEn- TLPPfxPres- HdrLogCap-
        HeaderLog: 00000000 00000000 00000000 00000000
    Capabilities: [150 v1] Virtual Channel
        Caps:   LPEVC=0 RefClk=100ns PATEntryBits=1
        Arb:    Fixed- WRR32- WRR64- WRR128-
        Ctrl:   ArbSelect=Fixed
        Status: InProgress-
        VC0:    Caps:   PATOffset=00 MaxTimeSlots=1 RejSnoopTrans-
            Arb:    Fixed- WRR32- WRR64- WRR128- TWRR128- WRR256-
            Ctrl:   Enable+ ID=0 ArbSelect=Fixed TC/VC=ff
            Status: NegoPending- InProgress-
    Capabilities: [260 v1] Latency Tolerance Reporting
        Max snoop latency: 0ns
        Max no snoop latency: 0ns
    Capabilities: [300 v1] Secondary PCI Express
        LnkCtl3: LnkEquIntrruptEn- PerformEqu-
        LaneErrStat: 0
    Capabilities: [400 v1] L1 PM Substates
        L1SubCap: PCI-PM_L1.2+ PCI-PM_L1.1- ASPM_L1.2+ ASPM_L1.1- L1_PM_Substates+
              PortCommonModeRestoreTime=60us PortTPowerOnTime=10us
        L1SubCtl1: PCI-PM_L1.2- PCI-PM_L1.1- ASPM_L1.2- ASPM_L1.1-
               T_CommonMode=0us LTR1.2_Threshold=0ns
        L1SubCtl2: T_PwrOn=10us
    Kernel driver in use: nvme
geerlingguy commented 2 years ago

I partitioned and formatted the NVMe drive, then ran my disk-benchmark.sh:

pi@taco:~ $ wget https://raw.githubusercontent.com/geerlingguy/raspberry-pi-dramble/master/setup/benchmarks/disk-benchmark.sh
pi@taco:~ $ chmod +x disk-benchmark.sh 
pi@taco:~ $ nano disk-benchmark.sh
pi@taco:~ $ sudo DEVICE_UNDER_TEST=/dev/nvme1n1p1 DEVICE_MOUNT_PATH=/mnt/nvme_bg4 ./disk-benchmark.sh

Results:

Benchmark Result
fio 1M sequential read 192 MB/s
iozone 1M random read 170 MB/s
iozone 1M random write 157 MB/s
iozone 4K random read 29.71 MB/s
iozone 4K random write 54.50 MB/s
geerlingguy commented 2 years ago

See also: https://www.anandtech.com/show/14962/the-toshiba-kioxia-bg4-1tb-ssd-review/10

To cut costs and save space, the BG4 is a DRAMless SSD that relies on the NVMe Host Memory Buffer (HMB) feature to help with the performance problems a DRAMless design usually brings. HMB doesn't completely eliminate the downsides of a DRAMless SSD, but it means that the worst-case performance only shows up in corner cases that are not relevant to typical client usage patterns.

My question is, is HMB supported on an old PCIe Gen 2 bus?

geerlingguy commented 2 years ago
pi@seaberry:~ $ sudo apt install -y nvme-cli
...
pi@seaberry:~ $ nvme help
nvme-1.12
usage: nvme <command> [<device>] [<args>]

Then all the details:

pi@seaberry:~ $ sudo nvme id-ctrl /dev/nvme2n1
NVME Identify Controller:
vid       : 0x1e0f
ssvid     : 0x1e0f
sn        : 717PC16AQH91        
mn        : KBG40ZNS1T02 TOSHIBA MEMORY             
fr        : AEGA0102
rab       : 3
ieee      : 8ce38e
cmic      : 0
mdts      : 9
cntlid    : 0
ver       : 0x10300
rtd3r     : 0x7a120
rtd3e     : 0x33450
oaes      : 0x200
ctratt    : 0x2
rrls      : 0
cntrltype : 0
fguid     : 
crdt1     : 0
crdt2     : 0
crdt3     : 0
oacs      : 0x1f
acl       : 3
aerl      : 7
frmw      : 0x14
lpa       : 0xe
elpe      : 255
npss      : 4
avscc     : 0
apsta     : 0x1
wctemp    : 355
cctemp    : 359
mtfa      : 0
hmpre     : 15616
hmmin     : 5888
tnvmcap   : 1024209543168
unvmcap   : 0
rpmbs     : 0
edstt     : 49
dsto      : 1
fwug      : 1
kas       : 0
hctma     : 0x1
mntmt     : 0
mxtmt     : 355
sanicap   : 0x2
hmminds   : 0
hmmaxd    : 0
nsetidmax : 0
endgidmax : 0
anatt     : 0
anacap    : 0
anagrpmax : 0
nanagrpid : 0
pels      : 0
sqes      : 0x66
cqes      : 0x44
maxcmd    : 0
nn        : 1
oncs      : 0x5f
fuses     : 0
fna       : 0
vwc       : 0x1
awun      : 65535
awupf     : 0
nvscc     : 1
nwpc      : 0
acwu      : 0
sgls      : 0
mnan      : 0
subnqn    : nqn.2018-06.com.toshiba-memory:KBG40ZNS1T02 TOSHIBA MEMORY:717PC16AQH91
ioccsz    : 0
iorcsz    : 0
icdoff    : 0
ctrattr   : 0
msdbd     : 0
ps    0 : mp:3.70W operational enlat:1 exlat:1 rrt:0 rrl:0
          rwt:0 rwl:0 idle_power:- active_power:-
ps    1 : mp:2.60W operational enlat:1 exlat:1 rrt:1 rrl:1
          rwt:1 rwl:1 idle_power:- active_power:-
ps    2 : mp:2.20W operational enlat:1 exlat:1 rrt:2 rrl:2
          rwt:2 rwl:2 idle_power:- active_power:-
ps    3 : mp:0.0500W non-operational enlat:800 exlat:1200 rrt:4 rrl:4
          rwt:4 rwl:4 idle_power:- active_power:-
ps    4 : mp:0.0050W non-operational enlat:3000 exlat:32000 rrt:4 rrl:4
          rwt:4 rwl:4 idle_power:- active_power:-
geerlingguy commented 2 years ago
pi@seaberry:~ $ sudo nvme smart-log /dev/nvme2n1
Smart Log for NVME device:nvme2n1 namespace-id:ffffffff
critical_warning            : 0
temperature             : 37 C
available_spare             : 100%
available_spare_threshold       : 10%
percentage_used             : 0%
endurance group critical warning summary: 0
data_units_read             : 12,995
data_units_written          : 3,721
host_read_commands          : 101,557
host_write_commands         : 234,724
controller_busy_time            : 1
power_cycles                : 2
power_on_hours              : 0
unsafe_shutdowns            : 0
media_errors                : 0
num_err_log_entries         : 0
Warning Temperature Time        : 0
Critical Composite Temperature Time : 0
Temperature Sensor 1           : 37 C
Thermal Management T1 Trans Count   : 0
Thermal Management T2 Trans Count   : 0
Thermal Management T1 Total Time    : 0
Thermal Management T2 Total Time    : 0
geerlingguy commented 2 years ago

It seems the Pi might be stuck on NVMe 1.0...?

pi@seaberry:~ $ sudo modinfo nvme
name:           nvme
filename:       (builtin)
version:        1.0
license:        GPL
file:           drivers/nvme/host/nvme
author:         Matthew Wilcox <willy@linux.intel.com>
parm:           use_threaded_interrupts:int
parm:           use_cmb_sqes:use controller's memory buffer for I/O SQes (bool)
parm:           max_host_mem_size_mb:Maximum Host Memory Buffer (HMB) size per controller (in MiB) (uint)
parm:           sgl_threshold:Use SGLs when average request segment size is larger or equal to this size. Use 0 to disable SGLs. (uint)
parm:           io_queue_depth:set io queue depth, should >= 2
parm:           write_queues:Number of queues to use for writes. If not set, reads and writes will share a queue set.
parm:           poll_queues:Number of queues to use for polled IO.
parm:           noacpi:disable acpi bios quirks (bool)
geerlingguy commented 2 years ago

Eh... the MODULE_VERSION is still 1.0 in kernel 5.15.x...

will127534 commented 2 years ago

See also: https://www.anandtech.com/show/14962/the-toshiba-kioxia-bg4-1tb-ssd-review/10

To cut costs and save space, the BG4 is a DRAMless SSD that relies on the NVMe Host Memory Buffer (HMB) feature to help with the performance problems a DRAMless design usually brings. HMB doesn't completely eliminate the downsides of a DRAMless SSD, but it means that the worst-case performance only shows up in corner cases that are not relevant to typical client usage patterns.

My question is, is HMB supported on an old PCIe Gen 2 bus?

I have another data point for PM991 (Samsung's version of BG4-like SSD). I can see it is using HMB on 5.10.17-v8+ kernel:

Here is the dmesg output:

[   43.826568] nvme nvme3: Shutdown timeout set to 8 seconds
[   43.828740] nvme nvme1: 4/0/0 default/read/poll queues
[   43.829812] nvme nvme0: 4/0/0 default/read/poll queues
[   43.831276]  nvme0n1: p1
[   43.832231]  nvme1n1: p1
[   43.849065] nvme nvme2: 4/0/0 default/read/poll queues
[   43.853839]  nvme2n1: p1 p2
[   43.903598] nvme nvme3: allocated 64 MiB host memory buffer.      <-------This device
[   44.098563] nvme nvme3: 4/0/0 default/read/poll queues
[   44.107160]  nvme3n1: p1

And here is the lspci output:

06:00.0 Non-Volatile memory controller: Samsung Electronics Co Ltd Device a809 (prog-if 02 [NVM Express])
        Subsystem: Samsung Electronics Co Ltd Device a801
        Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+
        Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
        Latency: 0
        Interrupt: pin A routed to IRQ 0
        Region 0: Memory at 600300000 (64-bit, non-prefetchable) [size=16K]
        Capabilities: [40] Power Management version 3
                Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-)
                Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME-
        Capabilities: [50] MSI: Enable- Count=1/32 Maskable- 64bit+
                Address: 0000000000000000  Data: 0000
        Capabilities: [70] Express (v2) Endpoint, MSI 00
                DevCap: MaxPayload 256 bytes, PhantFunc 0, Latency L0s unlimited, L1 unlimited
                        ExtTag- AttnBtn- AttnInd- PwrInd- RBE+ FLReset+ SlotPowerLimit 26.000W
                DevCtl: Report errors: Correctable- Non-Fatal- Fatal- Unsupported-
                        RlxdOrd+ ExtTag- PhantFunc- AuxPwr- NoSnoop+ FLReset-
                        MaxPayload 128 bytes, MaxReadReq 512 bytes
                DevSta: CorrErr- UncorrErr- FatalErr- UnsuppReq- AuxPwr- TransPend-
                LnkCap: Port #0, Speed 8GT/s, Width x4, ASPM L1, Exit Latency L0s unlimited, L1 <64us
                        ClockPM+ Surprise- LLActRep- BwNot- ASPMOptComp+
                LnkCtl: ASPM Disabled; RCB 64 bytes Disabled- CommClk+
                        ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
                LnkSta: Speed 5GT/s, Width x1, TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
                DevCap2: Completion Timeout: Range ABCD, TimeoutDis+, LTR+, OBFF Not Supported
                DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-, LTR-, OBFF Disabled
                LnkCtl2: Target Link Speed: 8GT/s, EnterCompliance- SpeedDis-
                         Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS-
                         Compliance De-emphasis: -6dB
                LnkSta2: Current De-emphasis Level: -6dB, EqualizationComplete-, EqualizationPhase1-
                         EqualizationPhase2-, EqualizationPhase3-, LinkEqualizationRequest-
        Capabilities: [b0] MSI-X: Enable+ Count=13 Masked-
                Vector table: BAR=0 offset=00003000
                PBA: BAR=0 offset=00002000
        Capabilities: [100 v2] Advanced Error Reporting
                UESta:  DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
                UEMsk:  DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
                UESvrt: DLP+ SDES+ TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt- RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol-
                CESta:  RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr-
                CEMsk:  RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr+
                AERCap: First Error Pointer: 00, GenCap+ CGenEn- ChkCap+ ChkEn-
        Capabilities: [148 v1] Device Serial Number 00-00-00-00-00-00-00-00
        Capabilities: [158 v1] Power Budgeting <?>
        Capabilities: [168 v1] #19
        Capabilities: [188 v1] Latency Tolerance Reporting
                Max snoop latency: 0ns
                Max no snoop latency: 0ns
        Capabilities: [190 v1] L1 PM Substates
                L1SubCap: PCI-PM_L1.2+ PCI-PM_L1.1+ ASPM_L1.2+ ASPM_L1.1+ L1_PM_Substates+
                          PortCommonModeRestoreTime=10us PortTPowerOnTime=10us
                L1SubCtl1: PCI-PM_L1.2- PCI-PM_L1.1- ASPM_L1.2- ASPM_L1.1-
                           T_CommonMode=0us LTR1.2_Threshold=0ns
                L1SubCtl2: T_PwrOn=10us
        Capabilities: [1a0 v1] #16
        Capabilities: [1d0 v1] #22
        Capabilities: [1dc v1] Vendor Specific Information: ID=0002 Rev=3 Len=100 <?>
        Capabilities: [2dc v1] Vendor Specific Information: ID=0001 Rev=1 Len=038 <?>
        Capabilities: [314 v1] Precision Time Measurement
                PTMCap: Requester:+ Responder:- Root:-
                PTMClockGranularity: Unimplemented
                PTMControl: Enabled:- RootSelected:-
                PTMEffectiveGranularity: Unknown
        Capabilities: [320 v1] Vendor Specific Information: ID=0003 Rev=1 Len=054 <?>
        Kernel driver in use: nvme
        Kernel modules: nvme
geerlingguy commented 2 years ago

@will127534 - Oh shoot, didn't even think to check the dmesg output for it. I'll check next time I boot it and see if it mentions the allocation.

The Pi's memory isn't insanely fast, though, so while it may help, it seems obvious after testing larger/faster DRAM-using NVMe drives that it makes a huge performance difference on the Pi.

elfranne commented 2 years ago

Just bought one of those for my Waveshare Carrier board and getting a kernel panic on boot (only when BG4 is plugged in). Any idea: bg4-panic

geerlingguy commented 2 years ago

@elfranne - Not sure why it would be doing that. Seeing stuff about IPI makes it seem like some sort of X86 feature is trying to be loaded and failing? Not sure what that would be though.

Added this card to the site though.

theiotidiot commented 11 months ago

Hey there Jeff,

Any tips on getting this guy to work on a turing pi 2 on node 1 & 2 connected to the mini pcie slots? I've started a few threads on the discord but I haven't found anything very helpful. I originally was thinking that the mini pcie on the turing pi 2 doesn't interact with the BMC, so there shouldn't be anything software wise that would stop this SSD from working with the CM4. I don't have another carrier board, so I can't test if it's the pi or the carrier board. If you have any insight, I'm all ears! Thanks,