Open Zeyi-Lin opened 9 months ago
I think disk utilization, disk io. CPU memory, CPU utilization are also necessary
I think disk utilization, disk io. CPU memory, CPU utilization are also necessary
🍺Get,added to the top floor.
I think the temperature of hardware is also needed.
https://github.com/grafana/grafana Is a good reference example. I was attracted by the features of this software the first time I used it. But the technology they use is not compatible with python.
🤩 Features description [Please make everyone to understand it]
研究者比较关心的监控指标主要包含:
细粒度:
之前用的工具有:
gpustat
: 精细监控每个user的GPU使用情况PyTorch hook
: https://pytorch.org/docs/stable/generated/torch.nn.Module.html#torch.nn.Module.register_forward_hookProfiler
: https://pytorch.org/tutorials/beginner/profiler.htmlps: