Open lars-t-hansen opened 10 months ago
Re at-a-glance functionality, for smaller systems (fox, at least) it works OK to just keep the dashboard for the cluster open. Right now it's pretty obvious from the dashboards and the weekly cluster summaries that while the GPUs on the ML systems have been pretty heavily used the last week, the GPUs on Fox have been very lightly loaded indeed. The Fox GPU nodes are at least as capable as the ML nodes - more A100s, more RAM of both kinds. Users who are past the experiment stage could usefully move over.
For Saga and bigger systems, keeping the dashboard open with the search set to "idle" might be an OK start. See #291.
This comes from the use cases (https://github.com/NAICNO/Jobanalyzer/blob/main/REQUIREMENTS.md#adm_unused_capacity):
There needs to be a top-level dashboard that provides an overview of all systems and how loaded they are, so as to give the admin the ability to load-balance. Right now the server's landing page is just an index.html with links to individual dashboards.
This dashboard also has various options for signing up for alerts so that manual monitoring is not necessary, but that's part of a bigger story around AIM and a more sophisticated back-end.