Open joe-eklund opened 2 years ago
Hey @joe-eklund
Yeah Samsung drives seem to report some of their SMART data in a non-standard way, which definitely causes issues for some users.
In your case it looks like:
High Fly Writes
is actually unusually high (atleast when compared to Backblaze's data).
Hardware ECC Recovered
is reported in a non-standard format. If its not failing SMART, you can safely ignore it for now. Click the attribute row in the table to get extended details for the attribute. Can I have Scrutiny only mark disks as failed if they fail the "critical" attribute list? I don't care about any of the other ones for the dashboard status.
This is definitely a common request, and something I'm working on (as I find time). It's currently tracked in #275
While you cannot yet configure the failure status in the dashboard, you can configure how/when you get notified -- limiting to only critical
attributes: https://github.com/AnalogJ/scrutiny/issues/300#issuecomment-1155984708
High Fly Writes
is actually unusually high (atleast when compared to Backblaze's data).
- Is this value unusually high for all your Seagate drives? I wonder if this may be another non-standard attribute.
I went and looked and a handful of drives don't even have that value at all (I guess they must be a different model or have a different firmware, even though they are still all Exos 10 TBs). Others have 0 as the number of the value, some have a WARN
with a smaller number, like ~24. And then the others that are marked as failed like I already discussed. 8 of them have that value marked as failed and Scrutiny has them marked as failed. So it seems like this is a legit value that Scrutiny is marked as failing, unlike the problematic Seagate values?
Hardware ECC Recovered
is reported in a non-standard format. If its not failing SMART, you can safely ignore it for now. Click the attribute row in the table to get extended details for the attribute.
Looks like I have three drives marked as failed in Scrutiny that have Hardware ECC Recovered
marked as failed and High Fly Writes
marked as warn. All the others that have Hardware ECC Recovered
marked as failed also have the High Fly Writes
marked as failed. So I guess I can just ignore it then...? I will say none of them are marked as SMART failed for this value.
Can I have Scrutiny only mark disks as failed if they fail the "critical" attribute list? I don't care about any of the other ones for the dashboard status.
This is definitely a common request, and something I'm working on (as I find time). It's currently tracked in #275
While you cannot yet configure the failure status in the dashboard, you can configure how/when you get notified -- limiting to only
critical
attributes: #300 (comment)
I see. I will go at least turn on failure notifications for critical only. That is definitely an improvement. I will keep an eye on #275 for disabling scrutiny analysis on non critical attributes.
I've too noticed some of my disks are reporting failed and they are all exclusively Seagate.
Would it be possible to implement a warning status that would be raised for non critical metrics that are above the thresholds?
I really would only want to see disks marked as failed when they are having data integrity issues or have stopped working.
Describe the bug I have 24 Seagate 10 TB exos drives. 11 of 24 are marked as "failed" in the Scrutiny dashboard. When inspected, none of the 11 have any
critical
attributes marked as failed. They all have one or both marked as failed forHardware ECC Recovered
andHigh Fly Writes
.I have extensively read through https://github.com/AnalogJ/scrutiny/issues/255, https://github.com/AnalogJ/scrutiny/blob/master/docs/TROUBLESHOOTING_DEVICE_COLLECTOR.md#seagate-drives-failing, and some other issues that referenced similar things. Looks like Seagate has been a problem child.
This makes me question if these are "incorrectly" marked as failed or not. I will say I followed the troubleshooting instructions and I had started out with 12 disks marked as failed, then it dropped to 11 after I followed the recommendations at https://github.com/AnalogJ/scrutiny/blob/master/docs/TROUBLESHOOTING_DEVICE_COLLECTOR.md#seagate-drives-failing.
So my two questions are:
Screenshots:
My collector YAML looks like:
I can provide a log file(s) if needed. Thanks!