NAICNO / Jobanalyzer

Easy to use resource usage report
MIT License
0 stars 1 forks source link

Cross-system dashboard #283

Open lars-t-hansen opened 10 months ago

lars-t-hansen commented 10 months ago

This comes from the use cases (https://github.com/NAICNO/Jobanalyzer/blob/main/REQUIREMENTS.md#adm_unused_capacity):

There needs to be a top-level dashboard that provides an overview of all systems and how loaded they are, so as to give the admin the ability to load-balance. Right now the server's landing page is just an index.html with links to individual dashboards.

This dashboard also has various options for signing up for alerts so that manual monitoring is not necessary, but that's part of a bigger story around AIM and a more sophisticated back-end.

lars-t-hansen commented 10 months ago

Re at-a-glance functionality, for smaller systems (fox, at least) it works OK to just keep the dashboard for the cluster open. Right now it's pretty obvious from the dashboards and the weekly cluster summaries that while the GPUs on the ML systems have been pretty heavily used the last week, the GPUs on Fox have been very lightly loaded indeed. The Fox GPU nodes are at least as capable as the ML nodes - more A100s, more RAM of both kinds. Users who are past the experiment stage could usefully move over.

lars-t-hansen commented 8 months ago

For Saga and bigger systems, keeping the dashboard open with the search set to "idle" might be an OK start. See #291.