Closed FellowPlanter closed 4 years ago
@ammen99 It does now, but the the return type of set_distribution had to be changed. The Dispatcher sends also a dummy command to all work machines in order to detect offline machines.
Second commit fixes #174 by skipping all jobs that are set to be run in the initial iteration through the new job distribution and dispatching/resuming them in the second iteration. That should be faster than sorting the distribution.
By the way, sorting won't be a problem since I already sort the distribution once in the scheduler, which means that complexity will remain the same. However, I think sorting makes the logic simpler.
I think we should separate the changes in this PR in two parts, and prioritize the one with sorting job entries before dispatching (Otherwise, we may get thrashing on one worker if one of the commands is CANCEL, or maybe the command will fail because it tries to acquire a license which isn't freed yet)
@M1keReck I have included a few fixes in the timeout-2 branch, I think you should include them. For example:
I think you need to rebase.
Marks work machines as offline whenever the Dispatcher could not send it's commands in due time (120 secs) and informs the user if a connection to the server couldn't be established.
fixes #106