fabianishere / pve-edge-kernel

Newer Linux kernels for Proxmox VE 7
370 stars 50 forks source link

NVMe err 0x13 on 5.18 and 5.19 #300

Closed lydaston closed 2 years ago

lydaston commented 2 years ago

After update PVE kernel to 5.19 by "apt install pve-kernel-5.19-edge", I got errors in "Syslog" as snapshot below. pve1

It happened on 5.18 as well but 5.17 work good.

Information of my NVMe as below.

root@pve:~# smartctl -a /dev/nvme0n1 smartctl 7.2 2020-12-30 r5155 [x86_64-linux-5.15.39-3-pve] (local build) Copyright (C) 2002-20, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION === Model Number: Lexar 512GB SSD Serial Number: MFM8182000030 Firmware Version: V1.19.B3 PCI Vendor/Subsystem ID: 0x1d97 IEEE OUI Identifier: 0xcaf25b Total NVM Capacity: 512,110,190,592 [512 GB] Unallocated NVM Capacity: 0 Controller ID: 0 NVMe Version: 1.4 Number of Namespaces: 1 Namespace 1 Size/Capacity: 512,110,190,592 [512 GB] Namespace 1 Formatted LBA Size: 512 Namespace 1 IEEE EUI-64: caf25b 020000a827 Local Time is: Fri Aug 5 13:43:13 2022 CST Firmware Updates (0x02): 1 Slot Optional Admin Commands (0x0017): Security Format Frmw_DL Self_Test Optional NVM Commands (0x004c): DS_Mngmt Wr_Zero Timestmp Log Page Attributes (0x0e): Cmd_Eff_Lg Ext_Get_Lg Telmtry_Lg Maximum Data Transfer Size: 32 Pages Warning Comp. Temp. Threshold: 81 Celsius Critical Comp. Temp. Threshold: 85 Celsius

Supported Power States St Op Max Active Idle RL RT WL WT Ent_Lat Ex_Lat 0 + 5.00W - - 0 0 0 0 5 700

Supported LBA Sizes (NSID 0x1) Id Fmt Data Metadt Rel_Perf 0 + 512 0 3

=== START OF SMART DATA SECTION === SMART overall-health self-assessment test result: PASSED

SMART/Health Information (NVMe Log 0x02) Critical Warning: 0x00 Temperature: 43 Celsius Available Spare: 100% Available Spare Threshold: 10% Percentage Used: 0% Data Units Read: 47,430 [24.2 GB] Data Units Written: 226,106 [115 GB] Host Read Commands: 435,435 Host Write Commands: 2,705,916 Controller Busy Time: 6 Power Cycles: 19 Power On Hours: 72 Unsafe Shutdowns: 6 Media and Data Integrity Errors: 0 Error Information Log Entries: 0 Warning Comp. Temperature Time: 0 Critical Comp. Temperature Time: 0

Read 16 entries from Error Information Log failed: NVMe Status 0x13

fabianishere commented 2 years ago

Might be related:

  1. https://bugzilla.kernel.org/show_bug.cgi?id=215763
  2. https://lore.kernel.org/lkml/20220610120554.ry7w37jbf3g6w3p3@quentin/T/
fabianishere commented 2 years ago

Can you reproduce on 6.x?