Closed unkcpz closed 1 month ago
It would be also useful if there is command can be used to check the load of workers. My calculations can runs 4 on a single node, but it seems not fill the resource.
Hi, to answer your second question first, this is determined by idle timeout. It is a duration after which a worker shuts down if it didn't receive any new jobs. When using automatic allocation, the default idle timeout interval is five minutes. So if you have an allocation with a worker and it doesn't receive anything to compute for five minutes, it will shut itself (and the allocation) down.
Regarding user load, we have a dashboard that you can run using hq dashboard
, however it is only available in the latest release (0.18). It is also currently very experimental.
Regarding the worker list table, the output of hq alloc list
can be very inaccurate. HQ is extremely conservative in invoking PBS/Slurm commands that return the current status of the queue, because in the past when we were running them without any limits, it was overloading the system schedulers. So HQ only asks once in an hour or so, and finds out about allocations mostly when a new worker connects to it. It's possible that with a newer version of HQ this would be better, maybe there were some fixes (0.12 is quite old).
Thanks! It is a clear explanation and I'll try with the new version.
The behavior the alloc assign workers is not clear to me. I am using version
0.12.0
. Here is the output of myhq worker list
:and my
hq alloc list
:If I understand correctly, the number should match or not exceed to the total workers allocated from alloc. It also not clear to me when will the worker being collected if no more job is submit to it.