Open JohnTigue opened 1 year ago
For GPU monitoring in a TUI context there are multiple options:
One TUI way that might well work with the Jupyter terminal would simply be to clear
and nvidia-smi -l 5
i.e. just run nvidia-smi every 5 seconds.
And could start the tmux session with one-time status like nvidia-smi --list-gpus
.
Here's [1] a super simple, not bad at all way of doing TUI in GUI (I.e. running inside a Jupyter CLI terminal:
watch -d -n 0.5 nvidia-smi
man watch tells us the -d flag highlights differences between the outputs, so it can aid in highlighting which metrics are changing over time
I just tested that now and it works well, including continually highlight the delta (inverts text/bg => white/back) which below is the time and temp changing.
It also sounds like tmux runs inside of Jupyter's terminal. That's great. Sounds like it was actually running better in Classic than the newer JupyterLab. The issue is still open: Unable to override tmux mouse mode in jupyterlab terminal #13005.
So, we should definitely see if we can set up a nice tmux dashboard, that runs within Jupyter's terminal.
Network traffic is another useful monitor. For example, if many gigabytes of data need to be downloaded during a set-up which has no UI feedback while each individual file downloads, then peeking at the network traffic in is a way to see that something is happening.
Kaggle has a system monitor in notebooks. It has UI for each of two GPU.
I wonder if that is a Jupyter or a Kaggle thing (widget?).
2020, Identify and monitor NVIDIA GPU usage in Kaggle notebooks:
(EDIT — Though initially, I was not able to use nvidia-smi inside a Kaggle kernel, later on, I found an alternative path from where it could be used and updated my kaggle notebook to show that. This post, however, is still relevant for anyone interested to know about a python package using which we can programmatically identify and monitor GPU usage.)
Seemingly the other solution involves pynvml as per Kaggle: pynvml module to identify and monitor GPU usage.
SageMaker has "SageMaker Debugger" which has nice UI: Monitor the system resource utilization using SageMaker Studio.
BrainTrust containers should have easy-to-use ways of monitoring usage of the system resources such as storage, network, and compute, especially the GPU. We want to have monitors implemented for both GUI and TUI.
Of course, Jupyter has terminals, so TUI monitor solutions are a cheap, not-too-clucky way of getting monitoring dashboards into Jupyter. That is, a TUI solution can do double duty as a GUI solution,
A promising legit real GUI solution might be a Jupyter widget. Jupyter widgets can run outside of Jupyter notebooks, so such a GUI monitoring widget could also be using in a clusterwide dashboard, not just for individual servers.
See also #98 as we would like the TUI solutions to work well with tmux. We want the TUI dashboards to be implemented as tmux sessions. (Hopefully, the Jupyter terminal will also work well with tmux…)