Closed ymartin-ovh closed 8 months ago
Got this on nvme device with -i auto:
/usr/lib/nagios/ovh/check_smart -i auto -g /dev/nvme0 --debug
Found /dev/nvme0
###########################################################
(debug) CHECK 1: getting overall SMART health status for
###########################################################
(debug) executing:
sudo /usr/sbin/smartctl -d auto -Hi /dev/nvme0
(debug) output:
smartctl 6.6 2017-11-05 r4594 [x86_64-linux-5.15.41-ovh-vps-grsec-zfs-classid] (local build)
Copyright (C) 2002-17, Bruce Allen, Christian Franke, www.smartmontools.org
=== START OF INFORMATION SECTION ===
Model Number: SAMSUNG MZVL2512HCJQ-00B07
Serial Number: S63CNF0R415493
Firmware Version: GXA7302Q
PCI Vendor/Subsystem ID: 0x144d
IEEE OUI Identifier: 0x002538
Total NVM Capacity: 512,110,190,592 [512 GB]
Unallocated NVM Capacity: 0
Controller ID: 6
Number of Namespaces: 1
Namespace 1 Size/Capacity: 512,110,190,592 [512 GB]
Namespace 1 Utilization: 462,648,926,208 [462 GB]
Namespace 1 Formatted LBA Size: 512
Namespace 1 IEEE EUI-64: 002538 b411b778d4
Local Time is: Wed Mar 6 17:27:20 2024 UTC
=== START OF SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED
(debug) parsing line:
Model Number: SAMSUNG MZVL2512HCJQ-00B07
(debug) found model: SAMSUNG MZVL2512HCJQ-00B07
(debug) parsing line:
Serial Number: S63CNF0R415493
(debug) found serial number S63CNF0R415493
(debug) parsing line:
SMART overall-health self-assessment test result: PASSED
(debug) found string 'PASSED'; status OK
###########################################################
(debug) CHECK 2: getting silent SMART health check
###########################################################
(debug) executing:
sudo /usr/sbin/smartctl -d auto -q silent -A /dev/nvme0
(debug) exit code:
0
(debug) zero exit code, status OK
###########################################################
(debug) CHECK 3: getting detailed statistics from attributes
(debug) information contains a few more potential trouble spots
(debug) plus, we can also use the information for perfdata/graphing
###########################################################
(debug) executing:
sudo /usr/sbin/smartctl -d auto -A /dev/nvme0
(debug) output:
smartctl 6.6 2017-11-05 r4594 [x86_64-linux-5.15.41-ovh-vps-grsec-zfs-classid] (local build)
Copyright (C) 2002-17, Bruce Allen, Christian Franke, www.smartmontools.org
=== START OF SMART DATA SECTION ===
SMART/Health Information (NVMe Log 0x02, NSID 0xffffffff)
Critical Warning: 0x00
Temperature: 38 Celsius
Available Spare: 83%
Available Spare Threshold: 10%
Percentage Used: 19%
Data Units Read: 83,833,423 [42.9 TB]
Data Units Written: 69,316,785 [35.4 TB]
Host Read Commands: 1,241,781,735
Host Write Commands: 1,632,519,014
Controller Busy Time: 36,946
Power Cycles: 40
Power On Hours: 48,708
Unsafe Shutdowns: 26
Media and Data Integrity Errors: 114
Error Information Log Entries: 114
Warning Comp. Temperature Time: 0
Critical Comp. Temperature Time: 0
Temperature Sensor 1: 38 Celsius
Temperature Sensor 2: 48 Celsius
(debug) Raw Check List ATA: Current_Pending_Sector,Reallocated_Sector_Ct,Program_Fail_Cnt_Total,Uncorrectable_Error_Cnt,Offline_Uncorrectable,Runtime_Bad_Block,Reported_Uncorrect,Reallocated_Event_Count,Erase_Fail_Count_Total
(debug) Raw Check List NVMe: Media_and_Data_Integrity_Errors
(debug) Exclude List for Checks:
(debug) Exclude List for Perfdata:
(debug) Warning Thresholds:
(debug) gathered perfdata:
###########################################################
(debug) LOCAL STATUS: OK, FINAL STATUS: OK
###########################################################
(debug) final status/output: OK
(debug) drives ok: [/dev/nvme0] - Device is clean
(debug) drives nok:
(debug) msg_list: [/dev/nvme0] - Device is clean
OK: [/dev/nvme0] - Device is clean|
I expect nvme attribute checks when device is nvme and -i auto is given:
Found /dev/nvme0
###########################################################
(debug) CHECK 1: getting overall SMART health status for
###########################################################
(debug) executing:
sudo /usr/sbin/smartctl -d auto -Hi /dev/nvme0
(debug) output:
smartctl 6.6 2017-11-05 r4594 [x86_64-linux-5.15.41-ovh-vps-grsec-zfs-classid] (local build)
Copyright (C) 2002-17, Bruce Allen, Christian Franke, www.smartmontools.org
=== START OF INFORMATION SECTION ===
Model Number: SAMSUNG MZVL2512HCJQ-00B07
Serial Number: S63CNF0R415493
Firmware Version: GXA7302Q
PCI Vendor/Subsystem ID: 0x144d
IEEE OUI Identifier: 0x002538
Total NVM Capacity: 512,110,190,592 [512 GB]
Unallocated NVM Capacity: 0
Controller ID: 6
Number of Namespaces: 1
Namespace 1 Size/Capacity: 512,110,190,592 [512 GB]
Namespace 1 Utilization: 462,648,926,208 [462 GB]
Namespace 1 Formatted LBA Size: 512
Namespace 1 IEEE EUI-64: 002538 b411b778d4
Local Time is: Wed Mar 6 17:36:56 2024 UTC
=== START OF SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED
(debug) parsing line:
Model Number: SAMSUNG MZVL2512HCJQ-00B07
(debug) found model: SAMSUNG MZVL2512HCJQ-00B07
(debug) parsing line:
Serial Number: S63CNF0R415493
(debug) found serial number S63CNF0R415493
(debug) parsing line:
SMART overall-health self-assessment test result: PASSED
(debug) found string 'PASSED'; status OK
###########################################################
(debug) CHECK 2: getting silent SMART health check
###########################################################
(debug) executing:
sudo /usr/sbin/smartctl -d auto -q silent -A /dev/nvme0
(debug) exit code:
0
(debug) zero exit code, status OK
###########################################################
(debug) CHECK 3: getting detailed statistics from attributes
(debug) information contains a few more potential trouble spots
(debug) plus, we can also use the information for perfdata/graphing
###########################################################
(debug) executing:
sudo /usr/sbin/smartctl -d auto -A /dev/nvme0
(debug) output:
smartctl 6.6 2017-11-05 r4594 [x86_64-linux-5.15.41-ovh-vps-grsec-zfs-classid] (local build)
Copyright (C) 2002-17, Bruce Allen, Christian Franke, www.smartmontools.org
=== START OF SMART DATA SECTION ===
SMART/Health Information (NVMe Log 0x02, NSID 0xffffffff)
Critical Warning: 0x00
Temperature: 38 Celsius
Available Spare: 83%
Available Spare Threshold: 10%
Percentage Used: 19%
Data Units Read: 83,833,423 [42.9 TB]
Data Units Written: 69,317,103 [35.4 TB]
Host Read Commands: 1,241,781,735
Host Write Commands: 1,632,532,652
Controller Busy Time: 36,946
Power Cycles: 40
Power On Hours: 48,708
Unsafe Shutdowns: 26
Media and Data Integrity Errors: 114
Error Information Log Entries: 114
Warning Comp. Temperature Time: 0
Critical Comp. Temperature Time: 0
Temperature Sensor 1: 38 Celsius
Temperature Sensor 2: 47 Celsius
(debug) Raw Check List ATA: Current_Pending_Sector Reallocated_Sector_Ct Program_Fail_Cnt_Total Uncorrectable_Error_Cnt Offline_Uncorrectable Runtime_Bad_Block Reported_Uncorrect Reallocated_Event_Count Erase_Fail_Count_Total
(debug) Raw Check List NVMe: Media_and_Data_Integrity_Errors
(debug) Exclude List for Checks:
(debug) Exclude List for Perfdata:
(debug) Warning Thresholds:
(debug) Critical_Warning not in raw check list (raw value: 0x00)
(debug) Temperature not in raw check list (raw value: 38)
(debug) Available_Spare not in raw check list (raw value: 83)
(debug) Available_Spare_Threshold not in raw check list (raw value: 10)
(debug) Percentage_Used not in raw check list (raw value: 19)
(debug) Data_Units_Read not in raw check list (raw value: 83833423)
(debug) Data_Units_Written not in raw check list (raw value: 69317103)
(debug) Host_Read_Commands not in raw check list (raw value: 1241781735)
(debug) Host_Write_Commands not in raw check list (raw value: 1632532652)
(debug) Controller_Busy_Time not in raw check list (raw value: 36946)
(debug) Power_Cycles not in raw check list (raw value: 40)
(debug) Power_On_Hours not in raw check list (raw value: 48708)
(debug) Unsafe_Shutdowns not in raw check list (raw value: 26)
(debug) Media_and_Data_Integrity_Errors is non-zero (114)
(debug) Error_Information_Log_Entries not in raw check list (raw value: 114)
(debug) Warning__Comp_Temperature_Time not in raw check list (raw value: 0)
(debug) Critical_Comp_Temperature_Time not in raw check list (raw value: 0)
(debug) Temperature_Sensor_1 not in raw check list (raw value: 38)
(debug) Temperature_Sensor_2 not in raw check list (raw value: 47)
(debug) gathered perfdata:
###########################################################
(debug) LOCAL STATUS: WARNING, FINAL STATUS: WARNING
###########################################################
(debug) final status/output: WARNING
(debug) drives ok:
(debug) drives nok: [/dev/nvme0] - [/dev/nvme0] - Media_and_Data_Integrity_Errors is non-zero (114)[/dev/nvme0] -
(debug) msg_list: [/dev/nvme0] - [/dev/nvme0] - Media_and_Data_Integrity_Errors is non-zero (114)[/dev/nvme0] -
WARNING: [/dev/nvme0] - [/dev/nvme0] - Media_and_Data_Integrity_Errors is non-zero (114)[/dev/nvme0] - |
Awesome find, thanks! Successfully tested on a server with NVME (and ATA) drives.
… is nvme