Closed pabsi closed 4 days ago
Could probably just a matter of adding a sleep
of some sort based on that ENV var I suggested, in this for
loop?
https://github.com/AnalogJ/scrutiny/blob/master/collector/pkg/collector/metrics.go#L87
Revisiting the code, I just realised there's a TODO
in the code about this very same topic :sweat_smile: :
https://github.com/AnalogJ/scrutiny/blob/master/collector/pkg/collector/metrics.go#L93-L94
Hey @pabsi I'd be happy to consider a change like this, if its optional and configurable via the collector config yaml file.
Can you open a PR?
I can try :)
As I said on the original post:
I would do it myself, but unfortunately I am not savvy enough on Go :(
But I'll give it a go ;)
@AnalogJ I can't raise a PR. GitHub threw me an error about not being a contributor.
You can see what I did here: https://github.com/AnalogJ/scrutiny/compare/AnalogJ:master...pabsi:706-add-wait-time-between-checks?expand=1
The test for the collector (go run collector/cmd/collector-metrics/collector-metrics.go run --debug
worked fine).
Regards.
Really sorry to bug you @AnalogJ but I had to submit a small fix for this PR: https://github.com/AnalogJ/scrutiny/pull/725
Thank you :pray:
Is your feature request related to a problem? Please describe. The particular issue arises when running a smart check over multiple disks which are connected USB-to-SATA. In my specific case, I have the Quad SATA Hat for the Pi 4, meaning 4 sata disks are connected via 2 USB 3.0 ports. Sometimes when running the smart checks against all 4 drives at once, the USB connection gets reset, and this, in my case, makes the mdadm RAID array fail and mark the devices as failed, and thus removing them from the array. Not a real issue, since I can
--re-add
them later. But it's very inconvenient. Moreover if the smart checks are run daily. See example ofdmesg
logs:I also say "sometimes" because there are times that despite running the 4 drives checks at once, it doesn't disconnect them. But I also experienced more stability when running the smart checks one by one, disk by disk, with a certain delay (just a bunch of seconds normally does the job).
Describe the solution you'd like A possible option would be to have some environment variable (e.g.
DELAY_BETWEEN_DISK_CHECKS
or whatever, naming is hard). Another option would be to offer a schedule per drive, but I think this would be way more engineering for perhaps a very specific problem not everyone has.I would do it myself, but unfortunately I am not savvy enough on Go :(
Additional context N/A
Other notes Thank you so much for your work. Really appreciate it :1st_place_medal: