Open KyanChen opened 2 weeks ago
hello,
we do not have this as a built-in feature. One of our engineers developed an unofficial script which can terminate a process if it's not utilizing GPU. you can integrate it into your CMDs to do what you want. doing that with interactive shells or notebooks would probably be a bit trickier.
Can you offer a implementation?
I have developed a python script. master node runs: https://github.com/KyanChen/GPUClusterConfig/blob/dev/gpu_parser.py worker node runs: https://github.com/KyanChen/GPUClusterConfig/blob/dev/gpu_monitor.py
I cannot offer a reference implementation besides the script I've already shared. your code looks fine at a quick glance. it does not seem like it's going to handle the case when you have multiple GPUs per node. but hey, if it works for you, I see no problem with you using it.
Describe the problem
Please implement a functionality in both the command prompt (cmd) and the shell environment that allows for automatic resource release in the event of a timeout or if the GPU utilization is too low.
Describe the solution you'd like
Please implement a functionality in both the command prompt (cmd) and the shell environment that allows for automatic resource release in the event of a timeout or if the GPU utilization is too low.
Describe alternatives you've considered
No response
Additional context
No response