cockpit-project / cockpit

Cockpit is a web-based graphical interface for servers.
http://www.cockpit-project.org/
GNU Lesser General Public License v2.1
11.25k stars 1.12k forks source link

metrics: History doesn't include reboots, but should #15983

Open garrett opened 3 years ago

garrett commented 3 years ago

Page: Metrics

When looking at the system history, it's not clear when a reboot happened.

Story: I had another system crash and would like to see the context of when spikes happened. Did the computer spike up in memory and CPU usage before the crash? Or what that the result of booting up? It's unclear in the UI at the moment. Having a boot marker like the system logs would make it more obvious.

martinpitt commented 3 years ago

Agreed -- it could even appear as an "event", like we have CPU/memory spikes. Chances are very high that a reboot triggers a CPU spike anyway.

martinpitt commented 2 years ago

We can certainly correlate this with reboots from last, and if we have it, we can certainly feed it in as event.

There are other causes of large data gaps, like suspends, rescue mode, or the admin just stopping PCP. Whenever we encounter a nontrivial data gap, should we visually set them apart somehow? i.e. start a new block instead of putting long contiguous empty graphs in between? that might make the page a bit easier to comprehend.

jelly commented 2 years ago

Related on my laptop (which I suspend), the metrics page seems to show empty blocks when my laptops suspends. image

garrett commented 2 years ago

Yeah, I'm getting that too. :disappointed:

dev-DTECH commented 1 year ago

Hey @garrett, I would like to work on this issue.

KKoukiou commented 1 year ago

Hey @garrett, I would like to work on this issue.

@dev-DTECH This is not a very easy good first issue actually. You will need to parse information about reboots from journal probably and insert these in the right timestrap in the metrics graph events. The whole code for this is here https://github.com/cockpit-project/cockpit/blob/main/pkg/metrics/metrics.jsx but as said, it's not just a 10 lines PR.

dev-DTECH commented 1 year ago

Hey @garrett, I would like to work on this issue.

@dev-DTECH This is not a very easy good first issue actually. You will need to parse information about reboots from journal probably and insert these in the right timestrap in the metrics graph events. The whole code for this is here https://github.com/cockpit-project/cockpit/blob/main/pkg/metrics/metrics.jsx but as said, it's not just a 10 lines PR.

Yeah I understand that but I am eager to learn and also I am well acquainted with journal. If anyone else is not working on it I can try to resolve this issue.

dev-DTECH commented 1 year ago

Hey @KKoukiou, I searched a bit and figured output that the command 'last -x' shows the timings of crash/reboot/shutdown So I am trying to use the output of this command to indicate the reboots in the metric history.

Is it the correct way or should I consider another way?

KKoukiou commented 1 year ago

@dev-DTECH is looks fine to start with that.

dev-DTECH commented 1 year ago

Hey @KKoukiou, so I got the reboot times using cockpit.spawn("last -x | grep reboot".split(" "))

Every reboot has a start time and end time image So should I show the whole range of time as the reboot or just the start/end?

martinpitt commented 1 year ago

Note that the reboot range seems to include the whole time between booting and shutting down. E.g. I usually apply OS updates on Saturday mornings, then reboot, and they look like this:

reboot   system boot  6.1.14-200.fc37. Sat Mar  4 07:25 - 09:12 (7+01:46)

I.e. it spans over a week -- I suppose the "7+" means "7 days, one hour, and 46 minutes". TBH I find that output rather hard to interpret.. It gets easier to read with --fulltimes:

reboot   system boot  6.1.14-200.fc37. Sat Mar  4 07:25:49 2023 - Sat Mar 11 09:12:46 2023 (7+01:46)

Plus, there's also shutdowns. But it seems to me that we can only show the time when the computer started, which I believe is the first timestamp. With that, we can also ignore the shutdowns.

Please don't run cockpit.script() with grep, run cockpit.spawn(["last", "--time-format=iso", "reboot"]). That time format is easier to parse, then you can use date-fn's parseISO() to convert it to an useful datetime object.

dev-DTECH commented 1 year ago

Ok that's much better formatted

This is cockpit.spawn(["last", "--time-format=iso", "reboot"]) then the time parsed with parseISO() image Thanks for the help. This will make my task so much easier.

ashutosh7i commented 11 months ago

Hello @martinpitt sir, Do we still need this feature? can i work on this issue??

martinpitt commented 11 months ago

@ashutosh7i yes, this is still relevant, and fixing would be nice! Note that this is not the easiest task to start with (not hard, but perhaps start with something easier). Please consider https://github.com/cockpit-project/cockpit/issues/15983#issuecomment-1496913507

ashutosh7i commented 11 months ago

So i have some progress on this,

Sample image- imageedit_2_7252825514

Now i have some questions-

  1. Since reboot is a critical event, should i show it in place of "spikes" or in place of "Load, Disk, Network, I/O" ?
  2. What about design? what exact phrase should i use, is "Reboot" fine? @garrett
garrett commented 11 months ago

Looks good. Thanks!

We might even want to consider making it bold, as it's not just an important event, but it is also a "landmark" event (where it is a specific event that shows when one session stopped and another started).

Since reboot is a critical event, should i show it in place of "spikes" or in place of "Load, Disk, Network, I/O" ?

Yes; thanks!

What about design? what exact phrase should i use, is "Reboot" fine?

Yes, that works.

martinpitt commented 11 months ago

Thanks @ashutosh7i ! Can you please send a pull request with your changes, so that we can review and test the implementation there? Cheers!

ajshrmaofficial commented 10 months ago

Hey @garrett @martinpitt , I hope you guys are doing well, I just wanted to ask you guys if this issue is still available, as I do not see any PR attached to it By the way, I liked this project very much and want to contribute if possible (I'm new to contributions). Thanks

martinpitt commented 10 months ago

@ajshrmaofficial Yes, it is still outstanding and there's no PR. Thanks for your interest! Please work through https://github.com/cockpit-project/cockpit/blob/main/HACKING.md first to set up a dev environment and learn how to do and test a change first. Have fun!