Open svatosFZU opened 2 weeks ago
Hi, sorry for the late reply, I'm quite busy at the moment. The automatic allocator has an internal rate limiter that can cause allocations to be stopped if too many failures occur. I'm not sure if that's what is causing this issue though. Are you still using HQ_AUTOALLOC_MAX_ALLOCATION_FAILS
?
Yes, the HQ server is started by
RUST_LOG=hyperqueue=debug HQ_AUTOALLOC_MAX_ALLOCATION_FAILS=100000 /home/svatosm/hq-v0.19.0-linux-x64/hq server start --journal /home/svatosm/hqJournal 2> /home/svatosm/hq-debug-output.log &
For some time, I am observing this situation. HyperQueue works fine for some time and then something happens. Jobs are coming to the HyperQueue, are buffered there but no workers are submitted even though the allocation queue has backlog set to 10. The situation from this morning:
the content of the debug log shows that HyperQueue cannot find workers it sent: