Open dotjaz opened 1 year ago
Hello, I've got the same problem on a Synology DS920+:
Maybe for me it's because the temperature isn't available... (I don't know why...)
edit:
Here the results of the command smartctl --xall --device sat /dev/sda > ./config/sda-sata1.log
run inside the container:
smartctl 7.2 2020-12-30 r5155 [x86_64-linux-4.4.302+] (local build)
Copyright (C) 2002-20, Bruce Allen, Christian Franke, www.smartmontools.org
=== START OF INFORMATION SECTION ===
Model Family: Western Digital Red
Device Model: WDC WD40EFRX-68WT0N0
Serial Number: WD-WCC4E0938783
LU WWN Device Id: 5 0014ee 2b46adcba
Firmware Version: 80.00A80
User Capacity: 4,000,787,030,016 bytes [4.00 TB]
Sector Sizes: 512 bytes logical, 4096 bytes physical
Rotation Rate: 5400 rpm
Device is: In smartctl database [for details use: -P show]
ATA Version is: ACS-2 (minor revision not indicated)
SATA Version is: SATA 3.0, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is: Sun Oct 1 11:08:31 2023 UTC
SMART support is: Available - device has SMART capability.
SMART support is: Enabled
AAM feature is: Unavailable
APM feature is: Unavailable
Rd look-ahead is: Enabled
Write cache is: Enabled
DSN feature is: Unavailable
ATA Security is: Disabled, NOT FROZEN [SEC1]
Wt Cache Reorder: Enabled
=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED
General SMART Values:
Offline data collection status: (0x00) Offline data collection activity
was never started.
Auto Offline Data Collection: Disabled.
Self-test execution status: ( 0) The previous self-test routine completed
without error or no self-test has ever
been run.
Total time to complete Offline
data collection: (52980) seconds.
Offline data collection
capabilities: (0x7b) SMART execute Offline immediate.
Auto Offline data collection on/off support.
Suspend Offline collection upon new
command.
Offline surface scan supported.
Self-test supported.
Conveyance Self-test supported.
Selective Self-test supported.
SMART capabilities: (0x0003) Saves SMART data before entering
power-saving mode.
Supports SMART auto save timer.
Error logging capability: (0x01) Error logging supported.
General Purpose Logging supported.
Short self-test routine
recommended polling time: ( 2) minutes.
Extended self-test routine
recommended polling time: ( 530) minutes.
Conveyance self-test routine
recommended polling time: ( 5) minutes.
SCT capabilities: (0x703d) SCT Status supported.
SCT Error Recovery Control supported.
SCT Feature Control supported.
SCT Data Table supported.
SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAGS VALUE WORST THRESH FAIL RAW_VALUE
1 Raw_Read_Error_Rate POSR-K 200 200 051 - 0
3 Spin_Up_Time POS--K 180 174 021 - 8000
4 Start_Stop_Count -O--CK 100 100 000 - 753
5 Reallocated_Sector_Ct PO--CK 200 200 140 - 0
7 Seek_Error_Rate -OSR-K 200 200 000 - 0
9 Power_On_Hours -O--CK 010 010 000 - 66336
10 Spin_Retry_Count -O--CK 100 100 000 - 0
11 Calibration_Retry_Count -O--CK 100 100 000 - 0
12 Power_Cycle_Count -O--CK 100 100 000 - 553
192 Power-Off_Retract_Count -O--CK 200 200 000 - 230
193 Load_Cycle_Count -O--CK 200 200 000 - 847
194 Temperature_Celsius -O---K 120 101 000 - 32
196 Reallocated_Event_Count -O--CK 200 200 000 - 0
197 Current_Pending_Sector -O--CK 200 200 000 - 0
198 Offline_Uncorrectable ----CK 100 253 000 - 0
199 UDMA_CRC_Error_Count -O--CK 200 200 000 - 0
200 Multi_Zone_Error_Rate ---R-- 200 200 000 - 0
||||||_ K auto-keep
|||||__ C event count
||||___ R error rate
|||____ S speed/performance
||_____ O updated online
|______ P prefailure warning
General Purpose Log Directory Version 1
SMART Log Directory Version 1 [multi-sector log support]
Address Access R/W Size Description
0x00 GPL,SL R/O 1 Log Directory
0x01 SL R/O 1 Summary SMART error log
0x02 SL R/O 5 Comprehensive SMART error log
0x03 GPL R/O 6 Ext. Comprehensive SMART error log
0x06 SL R/O 1 SMART self-test log
0x07 GPL R/O 1 Extended self-test log
0x09 SL R/W 1 Selective self-test log
0x10 GPL R/O 1 NCQ Command Error log
0x11 GPL R/O 1 SATA Phy Event Counters log
0x21 GPL R/O 1 Write stream error log
0x22 GPL R/O 1 Read stream error log
0x80-0x9f GPL,SL R/W 16 Host vendor specific log
0xa0-0xa7 GPL,SL VS 16 Device vendor specific log
0xa8-0xb7 GPL,SL VS 1 Device vendor specific log
0xbd GPL,SL VS 1 Device vendor specific log
0xc0 GPL,SL VS 1 Device vendor specific log
0xc1 GPL VS 93 Device vendor specific log
0xe0 GPL,SL R/W 1 SCT Command/Status
0xe1 GPL,SL R/W 1 SCT Data Transfer
SMART Extended Comprehensive Error Log Version: 1 (6 sectors)
No Errors Logged
SMART Extended Self-test Log Version: 1 (1 sectors)
Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error
# 1 Short offline Completed without error 00% 650 -
# 2 Short offline Completed without error 00% 482 -
# 3 Short offline Completed without error 00% 314 -
# 4 Short offline Completed without error 00% 146 -
# 5 Short offline Completed without error 00% 65515 -
# 6 Short offline Completed without error 00% 65347 -
# 7 Short offline Completed without error 00% 65179 -
# 8 Short offline Completed without error 00% 65011 -
# 9 Short offline Completed without error 00% 64843 -
#10 Short offline Completed without error 00% 64675 -
#11 Short offline Completed without error 00% 64508 -
#12 Short offline Completed without error 00% 64340 -
#13 Short offline Completed without error 00% 64172 -
#14 Short offline Completed without error 00% 64004 -
#15 Short offline Completed without error 00% 63836 -
#16 Short offline Completed without error 00% 63669 -
#17 Short offline Completed without error 00% 63501 -
#18 Short offline Completed without error 00% 63333 -
SMART Selective self-test log data structure revision number 1
SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS
1 0 0 Not_testing
2 0 0 Not_testing
3 0 0 Not_testing
4 0 0 Not_testing
5 0 0 Not_testing
Selective self-test flags (0x0):
After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.
SCT Status Version: 3
SCT Version (vendor specific): 258 (0x0102)
Device State: Active (0)
Current Temperature: 32 Celsius
Power Cycle Min/Max Temperature: 30/37 Celsius
Lifetime Min/Max Temperature: 3/51 Celsius
Under/Over Temperature Limit Count: 0/0
Vendor specific:
01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
SCT Temperature History Version: 2
Temperature Sampling Period: 1 minute
Temperature Logging Interval: 1 minute
Min/Max recommended Temperature: 0/60 Celsius
Min/Max Temperature Limit: -41/85 Celsius
Temperature History Size (Index): 478 (83)
Index Estimated Time Temperature Celsius
84 2023-10-01 03:11 32 *************
... ..(130 skipped). .. *************
215 2023-10-01 05:22 32 *************
216 2023-10-01 05:23 33 **************
... ..(193 skipped). .. **************
410 2023-10-01 08:37 33 **************
411 2023-10-01 08:38 32 *************
... ..( 17 skipped). .. *************
429 2023-10-01 08:56 32 *************
430 2023-10-01 08:57 31 ************
... ..( 70 skipped). .. ************
23 2023-10-01 10:08 31 ************
24 2023-10-01 10:09 32 *************
... ..( 58 skipped). .. *************
83 2023-10-01 11:08 32 *************
SCT Error Recovery Control:
Read: 70 (7.0 seconds)
Write: 70 (7.0 seconds)
Device Statistics (GP/SMART Log 0x04) not supported
Pending Defects log (GP Log 0x0c) not supported
SATA Phy Event Counters (GP Log 0x11)
ID Size Value Description
0x0001 2 0 Command failed due to ICRC error
0x0002 2 0 R_ERR response for data FIS
0x0003 2 0 R_ERR response for device-to-host data FIS
0x0004 2 0 R_ERR response for host-to-device data FIS
0x0005 2 0 R_ERR response for non-data FIS
0x0006 2 0 R_ERR response for device-to-host non-data FIS
0x0007 2 0 R_ERR response for host-to-device non-data FIS
0x0008 2 0 Device-to-host non-data FIS retries
0x0009 2 1 Transition from drive PhyRdy to drive PhyNRdy
0x000a 2 1 Device-to-host register FISes sent due to a COMRESET
0x000b 2 0 CRC errors within host-to-device FIS
0x000f 2 0 R_ERR response for host-to-device data FIS, CRC
0x0012 2 0 R_ERR response for host-to-device non-data FIS, CRC
0x8000 4 437407 Vendor specific
And directly from the DS920+:
smartctl 6.5 (build date Sep 26 2022) [x86_64-linux-4.4.302+] (local build)
Copyright (C) 2002-16, Bruce Allen, Christian Franke, www.smartmontools.org
=== START OF INFORMATION SECTION ===
Model Family: Western Digital Red
Device Model: WDC WD40EFRX-68WT0N0
Serial Number: WD-WCC4E0938783
LU WWN Device Id: 5 0014ee 2b46adcba
Firmware Version: 80.00A80
User Capacity: 4,000,787,030,016 bytes [4.00 TB]
Sector Sizes: 512 bytes logical, 4096 bytes physical
Rotation Rate: 5400 rpm
Device is: In smartctl database [for details use: -P show]
ATA Version is: ACS-2 (minor revision not indicated)
SATA Version is: SATA 3.0, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is: Sun Oct 1 13:12:03 2023 CEST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled
AAM feature is: Unavailable
APM feature is: Unavailable
Rd look-ahead is: Enabled
Write cache is: Enabled
ATA Security is: Disabled, NOT FROZEN [SEC1]
Wt Cache Reorder: Enabled
=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED
General SMART Values:
Offline data collection status: (0x00) Offline data collection activity
was never started.
Auto Offline Data Collection: Disabled.
Self-test execution status: ( 0) The previous self-test routine completed
without error or no self-test has ever
been run.
Total time to complete Offline
data collection: (52980) seconds.
Offline data collection
capabilities: (0x7b) SMART execute Offline immediate.
Auto Offline data collection on/off support.
Suspend Offline collection upon new
command.
Offline surface scan supported.
Self-test supported.
Conveyance Self-test supported.
Selective Self-test supported.
SMART capabilities: (0x0003) Saves SMART data before entering
power-saving mode.
Supports SMART auto save timer.
Error logging capability: (0x01) Error logging supported.
General Purpose Logging supported.
Short self-test routine
recommended polling time: ( 2) minutes.
Extended self-test routine
recommended polling time: ( 530) minutes.
Conveyance self-test routine
recommended polling time: ( 5) minutes.
SCT capabilities: (0x703d) SCT Status supported.
SCT Error Recovery Control supported.
SCT Feature Control supported.
SCT Data Table supported.
SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAGS VALUE WORST THRESH FAIL RAW_VALUE
1 Raw_Read_Error_Rate POSR-K 200 200 051 - 0
3 Spin_Up_Time POS--K 180 174 021 - 8000
4 Start_Stop_Count -O--CK 100 100 000 - 753
5 Reallocated_Sector_Ct PO--CK 200 200 140 - 0
7 Seek_Error_Rate -OSR-K 200 200 000 - 0
9 Power_On_Hours -O--CK 010 010 000 - 66336
10 Spin_Retry_Count -O--CK 100 100 000 - 0
11 Calibration_Retry_Count -O--CK 100 100 000 - 0
12 Power_Cycle_Count -O--CK 100 100 000 - 553
192 Power-Off_Retract_Count -O--CK 200 200 000 - 230
193 Load_Cycle_Count -O--CK 200 200 000 - 847
194 Temperature_Celsius -O---K 119 101 000 - 33
196 Reallocated_Event_Count -O--CK 200 200 000 - 0
197 Current_Pending_Sector -O--CK 200 200 000 - 0
198 Offline_Uncorrectable ----CK 100 253 000 - 0
199 UDMA_CRC_Error_Count -O--CK 200 200 000 - 0
200 Multi_Zone_Error_Rate ---R-- 200 200 000 - 0
||||||_ K auto-keep
|||||__ C event count
||||___ R error rate
|||____ S speed/performance
||_____ O updated online
|______ P prefailure warning
General Purpose Log Directory Version 1
SMART Log Directory Version 1 [multi-sector log support]
Address Access R/W Size Description
0x00 GPL,SL R/O 1 Log Directory
0x01 SL R/O 1 Summary SMART error log
0x02 SL R/O 5 Comprehensive SMART error log
0x03 GPL R/O 6 Ext. Comprehensive SMART error log
0x06 SL R/O 1 SMART self-test log
0x07 GPL R/O 1 Extended self-test log
0x09 SL R/W 1 Selective self-test log
0x10 GPL R/O 1 SATA NCQ Queued Error log
0x11 GPL R/O 1 SATA Phy Event Counters log
0x21 GPL R/O 1 Write stream error log
0x22 GPL R/O 1 Read stream error log
0x80-0x9f GPL,SL R/W 16 Host vendor specific log
0xa0-0xa7 GPL,SL VS 16 Device vendor specific log
0xa8-0xb7 GPL,SL VS 1 Device vendor specific log
0xbd GPL,SL VS 1 Device vendor specific log
0xc0 GPL,SL VS 1 Device vendor specific log
0xc1 GPL VS 93 Device vendor specific log
0xe0 GPL,SL R/W 1 SCT Command/Status
0xe1 GPL,SL R/W 1 SCT Data Transfer
SMART Extended Comprehensive Error Log Version: 1 (6 sectors)
No Errors Logged
SMART Extended Self-test Log Version: 1 (1 sectors)
Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error
# 1 Short offline Completed without error 00% 650 -
# 2 Short offline Completed without error 00% 482 -
# 3 Short offline Completed without error 00% 314 -
# 4 Short offline Completed without error 00% 146 -
# 5 Short offline Completed without error 00% 65515 -
# 6 Short offline Completed without error 00% 65347 -
# 7 Short offline Completed without error 00% 65179 -
# 8 Short offline Completed without error 00% 65011 -
# 9 Short offline Completed without error 00% 64843 -
#10 Short offline Completed without error 00% 64675 -
#11 Short offline Completed without error 00% 64508 -
#12 Short offline Completed without error 00% 64340 -
#13 Short offline Completed without error 00% 64172 -
#14 Short offline Completed without error 00% 64004 -
#15 Short offline Completed without error 00% 63836 -
#16 Short offline Completed without error 00% 63669 -
#17 Short offline Completed without error 00% 63501 -
#18 Short offline Completed without error 00% 63333 -
SMART Selective self-test log data structure revision number 1
SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS
1 0 0 Not_testing
2 0 0 Not_testing
3 0 0 Not_testing
4 0 0 Not_testing
5 0 0 Not_testing
Selective self-test flags (0x0):
After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.
SCT Status Version: 3
SCT Version (vendor specific): 258 (0x0102)
SCT Support Level: 1
Device State: Active (0)
Current Temperature: 33 Celsius
Power Cycle Min/Max Temperature: 30/37 Celsius
Lifetime Min/Max Temperature: 3/51 Celsius
Under/Over Temperature Limit Count: 0/0
Vendor specific:
01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
SCT Temperature History Version: 2
Temperature Sampling Period: 1 minute
Temperature Logging Interval: 1 minute
Min/Max recommended Temperature: 0/60 Celsius
Min/Max Temperature Limit: -41/85 Celsius
Temperature History Size (Index): 478 (86)
Index Estimated Time Temperature Celsius
87 2023-10-01 05:15 32 *************
... ..(129 skipped). .. *************
217 2023-10-01 07:25 32 *************
218 2023-10-01 07:26 33 **************
... ..(191 skipped). .. **************
410 2023-10-01 10:38 33 **************
411 2023-10-01 10:39 32 *************
... ..( 17 skipped). .. *************
429 2023-10-01 10:57 32 *************
430 2023-10-01 10:58 31 ************
... ..( 70 skipped). .. ************
23 2023-10-01 12:09 31 ************
24 2023-10-01 12:10 32 *************
... ..( 61 skipped). .. *************
86 2023-10-01 13:12 32 *************
SCT Error Recovery Control:
Read: 70 (7.0 seconds)
Write: 70 (7.0 seconds)
Device Statistics (GP/SMART Log 0x04) not supported
SATA Phy Event Counters (GP Log 0x11)
ID Size Value Description
0x0001 2 0 Command failed due to ICRC error
0x0002 2 0 R_ERR response for data FIS
0x0003 2 0 R_ERR response for device-to-host data FIS
0x0004 2 0 R_ERR response for host-to-device data FIS
0x0005 2 0 R_ERR response for non-data FIS
0x0006 2 0 R_ERR response for device-to-host non-data FIS
0x0007 2 0 R_ERR response for host-to-device non-data FIS
0x0008 2 0 Device-to-host non-data FIS retries
0x0009 2 1 Transition from drive PhyRdy to drive PhyNRdy
0x000a 2 1 Device-to-host register FISes sent due to a COMRESET
0x000b 2 0 CRC errors within host-to-device FIS
0x000f 2 0 R_ERR response for host-to-device data FIS, CRC
0x0012 2 0 R_ERR response for host-to-device non-data FIS, CRC
0x8000 4 437619 Vendor specific
I'm having a similar problem, where a drive previously reported as failed shows as passed now.
EDIT: the problem seems to be caused by #512. Still, the drive shouldn't show as failed.
Same issue for me - using a collector on Unraid, reporting into scrutiny frontend/influxdb hosted on a different system.
Temperature monitoring seems to work fine though...
I had the same situation but running the container with privileged: true solved the issue.
I have a similar issue. For two of my disks (both are PASSES by smart), one shows FAILED by scrutiny while another shows PASSED. By checking the attributes, I cannot tell the difference between the two. What's the reason? <img width="686" alt="Snipaste_2024-01-27_12-08-11" src="https://github.com/AnalogJ/scrutiny/assets/28671430/21cfbebf-521c-43f0-b9b0-65984f627c0b">
Same problem, doesnt work on Synology, even with privileged:true. Even so I cant get SMART data even as root connected to the NAS. I think Synology has disabled SMART support on a system level
smartctl --xall /dev/sda
smartctl 6.5 (build date Sep 26 2022) [x86_64-linux-4.4.302+] (local build)
Copyright (C) 2002-16, Bruce Allen, Christian Franke, www.smartmontools.org
=== START OF INFORMATION SECTION ===
Vendor: TOSHIBA
Product: HDWE140
Rotation Rate: 7200 rpm
Form Factor: 3.5 inches
Local Time is: Wed Apr 3 18:18:10 2024 CEST
SMART support is: Unavailable - device lacks SMART capability.
Read Cache is: Enabled
Writeback Cache is: Disabled
I fixed it thanks to https://github.com/AnalogJ/scrutiny/issues/48
- device: /dev/sda
type: 'sat'
- device: /dev/sdb
type: 'sat'
- device: /dev/sdc
type: 'sat'
- device: /dev/sdd
type: 'sat'
- device: /dev/sde
type: 'sat'
- device: /dev/sdf
type: 'sat'
It would be great if the system could auto-detect Synology and apply that config automatically
Just ran into the same problem on multiple of my systems too, not only Synology but Proxmox too.
I'd love to have a CLI (and Docker) variable where I can just force all devices to be detected as ATA, this way it dynamically also adds new devices which (to my understanding) wouldn't be the case when using the collector.yaml
.
EDIT: I think I found how to do this now. The collector.yaml has a section for global smartctl args, I'm giving this a try now :)
Describe the bug Drive shows as failed for no reason
Expected behavior Drive is says SMART passed, GUI should display it as passed
smartctl --xall --device nvme /dev/nvme
Screenshots
Log Files collector.log web.log