Closed andreasn closed 3 years ago
All of the links here are 404, but I suppose it should be easy to find their current URLs. However, we don't currently look for any global error of that sort. Adding SMART alerts for hard drives to the System front page does sound like a good idea. Crashing services or "suddenly" absent network connections are a different beast though, as these are not well-defined and we don't have state in between cockpit sessions.
@andreasn, WDYT of repurposing this to showing SMART alerts? This is then a concrete and tangible issue.
Updated links to the current notification pages, now that these patterns are not drafts any more: https://www.patternfly.org/pattern-library/communication/notification-drawer/ https://www.patternfly.org/pattern-library/communication/toast-notifications/
Thanks. Retitling accordingly, so that this becomes actionable.
Whoops, I made a new feature request after I found this, thinking this was just a way of alerting users and not for showing SMART data in general.
I'll re-drop my two cents here though. While cockpit shows some kind of status, adding SMART data like the GNOME Disks utility would make the disk information page much more useful. (The drive I mentioned in my original report has since failed and Cockpit still shows it's OK.)
For reference, this is how the GNOME Disks overview looks:
thinking this was just a way of alerting users and not for showing SMART data in general.
yeah, this issue is styled in a sense it is just about showing warnings, but really showing warning without showing overview as well would make little sense so I consider this issue to count on that as well.
(editing a bit, thought I was commenting on a different issue)
Looking at the screenshot above, the actually interesting bits would be if anything in the Assessment column is not marked as OK. That needs to be shown without having to dive into every single disk. One place we could show that would be in the Health card on the overview page. But maybe also on the Storage page next to each disk? The details could go onto the pages of individual disks.
@andreasn GNOME Disks shows a one-line summary in the "Overall assessment" field. Maybe this can be shown in the Health card if it is not simply "OK" like shown in the screenshot?
For instance, if a disk has bad sectors, the overall assessment would show "Disk is OK, X bad sectors" where X is a number. While the disk might be fine enough now, this usually does indicate some imminent future failure which should be especially important in the use case of a server, and thus Cockpit.
pkg/lib/notifications.js
is already doing it, we use page_status to present failed status already in overview page or the shell.
There is another issue discussing the same feature request, closing this as a duplicate. https://github.com/cockpit-project/cockpit/issues/11437
Hey, I'm wondering if this ticket should be kept open to track the state of SMART monitoring in cockpit?
Looking through the issues, I found #15010 and that was closed, stating it was a duplicate of this ticket. https://github.com/cockpit-project/cockpit/issues/15010
I think at least a simple overview of SMART data would be great, even if I had to manually trigger that, for example in the storage panel, after selecting a particular disk. I'd even take verbatim output from a smartctl
run somewhere behind the scenes...
It is difficult to cover everything, as there is no real standard for what is reported, or how it is reported, e.g. several vendors (and even several drives of the same vendors) might have different means of stating how much data has been written in total, so it might not be possible to have sensible "alerts" for that...
SMART info should be:
It's definitely part of #11437, but requires special SMART-specific work to be implemented.
There's an issue @ #15460 that tracks disk errors being reported in Cockpit, including SMART, at both locations. That would cover both 2 and 3, but not item 1 for just displaying current SMART information.
@markwort: I agree with you and have re-opened #15010, as displaying the SMART status and information (even just the current high-level status that SMART reports) is distinct from displaying SMART warnings/errors.
As far as errors are concerned, SMART does have an overall status, even if different vendors have different details and thresholds. We can't reliably parse the extended information, but we can show it. However, with some vendors especially, sometimes it may lead someone to think a disk is failing when it isn't.
Maybe a duplicate of another issue I've since forgot :) Filing because it came up in a conversation.
We currently have no good way in Cockpit to notify of errors globally. If really severe stuff is happening to the system, it needs to be messaged correctly. This could be things like services crashing, hard drives failing or network connections suddenly missing. In short; Things that will impact the passengers on the bus and that needs to be acted upon ASAP, if going with that metaphor.
Patternfly recommends this pattern for notifications: https://www.patternfly.org/wikis/patterns/pattern-development/draft-patterns/global-notifications/
If this sounds good, I'll go ahead and create a wiki page for it.