Open kaka2991 opened 3 years ago
Definitely sth wrong, because after click on "Rerun all" now job is in queue (still no agents):
Damn, after authorising the server as agent, job is assigned:
but the execution popped in the infinite loop. Server/agent is using 100% CPU and continuously throws the following entries (docker compose logs):
agent_1 | 2021-03-15 21:50:39,441 INFO p: 7 agent:177 received job: {'id': 5, 'created': '2021-03-15T21:43:57Z', 'deleted': None, 'started': '2021-03-15T21:46:37Z', 'finished': None, 'completed': None, 'duration': '4m 1s', 'name': 'hello world', 'state': 3, 'completio
agent_1 | 2021-03-15 21:50:39,443 INFO p: 7 agent:61 job now: 2021-03-15 21:50:39, deadline: 2021-03-15 21:51:28, time: 49s
agent_1 | 2021-03-15 21:50:39,444 INFO p: 7 jobber:381 started job in /tmp/kk/jobs/5
agent_1 | 2021-03-15 21:50:39,445 INFO p: 7 jobber:425 completed job 5 with status None
server_1 | 2021-03-15 21:50:39,448 INFO p: 10 backend:499 request data: {'address': 'server', 'msg': 'get-job'}
server_1 | 2021-03-15 21:50:39,457 INFO p: 10 backend:64 hello world-<System 1>-1: now: 2021-03-15 21:50:39.457847, slip:0:04:05.310865, to1: 300, to2: 54.68913499999999, to3: 49.22022149
9999994
server_1 | 2021-03-15 21:50:39,469 INFO p: 10 backend:547 sending response: {'job': {'id': 5, 'created': '2021-03-15T21:43:57Z', 'deleted': None, 'started': '2021-03-15T21:46:37Z', 'finish
ed': None, 'completed': None, 'duration': '4m 1s', 'name': 'hello world', 'state': 3, 'c
agent_1 | 2021-03-15 21:50:39,471 INFO p: 7 agent:177 received job: {'id': 5, 'created': '2021-03-15T21:43:57Z', 'deleted': None, 'started': '2021-03-15T21:46:37Z', 'finished': None, '
completed': None, 'duration': '4m 1s', 'name': 'hello world', 'state': 3, 'completio
agent_1 | 2021-03-15 21:50:39,473 INFO p: 7 agent:61 job now: 2021-03-15 21:50:39, deadline: 2021-03-15 21:51:28, time: 49s
agent_1 | 2021-03-15 21:50:39,474 INFO p: 7 jobber:381 started job in /tmp/kk/jobs/5
agent_1 | 2021-03-15 21:50:39,475 INFO p: 7 jobber:425 completed job 5 with status None
server_1 | 2021-03-15 21:50:39,478 INFO p: 11 backend:499 request data: {'address': 'server', 'msg': 'get-job'}
server_1 | 2021-03-15 21:50:39,486 INFO p: 11 backend:64 hello world-<System 1>-1: now: 2021-03-15 21:50:39.486334, slip:0:04:05.339352, to1: 300, to2: 54.66064800000001, to3: 49.19458320
000001
server_1 | 2021-03-15 21:50:39,494 INFO p: 11 backend:547 sending response: {'job': {'id': 5, 'created': '2021-03-15T21:43:57Z', 'deleted': None, 'started': '2021-03-15T21:46:37Z', 'finish
ed': None, 'completed': None, 'duration': '4m 1s', 'name': 'hello world', 'state': 3, 'c
agent_1 | 2021-03-15 21:50:39,496 INFO p: 7 agent:177 received job: {'id': 5, 'created': '2021-03-15T21:43:57Z', 'deleted': None, 'started': '2021-03-15T21:46:37Z', 'finished': None, '
completed': None, 'duration': '4m 1s', 'name': 'hello world', 'state': 3, 'completio
agent_1 | 2021-03-15 21:50:39,496 INFO p: 7 agent:61 job now: 2021-03-15 21:50:39, deadline: 2021-03-15 21:51:28, time: 49s
agent_1 | 2021-03-15 21:50:39,498 INFO p: 7 jobber:381 started job in /tmp/kk/jobs/5
agent_1 | 2021-03-15 21:50:39,499 INFO p: 7 jobber:425 completed job 5 with status None
server_1 | 2021-03-15 21:50:39,502 INFO p: 10 backend:499 request data: {'address': 'server', 'msg': 'get-job'}
server_1 | 2021-03-15 21:50:39,511 INFO p: 10 backend:64 hello world-<System 1>-1: now: 2021-03-15 21:50:39.511499, slip:0:04:05.364517, to1: 300, to2: 54.635482999999994, to3: 49.171934699999994
server_1 | 2021-03-15 21:50:39,520 INFO p: 10 backend:547 sending response: {'job': {'id': 5, 'created': '2021-03-15T21:43:57Z', 'deleted': None, 'started': '2021-03-15T21:46:37Z', 'finished': None, 'completed': None, 'duration': '4m 1s', 'name': 'hello world', 'state': 3, 'c
agent_1 | 2021-03-15 21:50:39,521 INFO p: 7 agent:177 received job: {'id': 5, 'created': '2021-03-15T21:43:57Z', 'deleted': None, 'started': '2021-03-15T21:46:37Z', 'finished': None, 'completed': None, 'duration': '4m 1s', 'name': 'hello world', 'state': 3, 'completio
agent_1 | 2021-03-15 21:50:39,522 INFO p: 7 agent:61 job now: 2021-03-15 21:50:39, deadline: 2021-03-15 21:51:28, time: 49s
agent_1 | 2021-03-15 21:50:39,524 INFO p: 7 jobber:381 started job in /tmp/kk/jobs/5
agent_1 | 2021-03-15 21:50:39,525 INFO p: 7 jobber:425 completed job 5 with status None
server_1 | 2021-03-15 21:50:39,528 INFO p: 11 backend:499 request data: {'address': 'server', 'msg': 'get-job'}
server_1 | 2021-03-15 21:50:39,536 INFO p: 11 backend:64 hello world-<System 1>-1: now: 2021-03-15 21:50:39.536827, slip:0:04:05.389845, to1: 300, to2: 54.61015499999999, to3: 49.1491395
server_1 | 2021-03-15 21:50:39,547 INFO p: 11 backend:547 sending response: {'job': {'id': 5, 'created': '2021-03-15T21:43:57Z', 'deleted': None, 'started': '2021-03-15T21:46:37Z', 'finished': None, 'completed': None, 'duration': '4m 1s', 'name': 'hello world', 'state': 3, 'c
agent_1 | 2021-03-15 21:50:39,548 INFO p: 7 agent:177 received job: {'id': 5, 'created': '2021-03-15T21:43:57Z', 'deleted': None, 'started': '2021-03-15T21:46:37Z', 'finished': None, 'completed': None, 'duration': '4m 1s', 'name': 'hello world', 'state': 3, 'completio
agent_1 | 2021-03-15 21:50:39,549 INFO p: 7 agent:61 job now: 2021-03-15 21:50:39, deadline: 2021-03-15 21:51:28, time: 49s
agent_1 | 2021-03-15 21:50:39,551 INFO p: 7 jobber:381 started job in /tmp/kk/jobs/5
agent_1 | 2021-03-15 21:50:39,552 INFO p: 7 jobber:425 completed job 5 with status None
server_1 | 2021-03-15 21:50:39,555 INFO p: 10 backend:499 request data: {'address': 'server', 'msg': 'get-job'}
server_1 | 2021-03-15 21:50:39,563 INFO p: 10 backend:64 hello world-<System 1>-1: now: 2021-03-15 21:50:39.563366, slip:0:04:05.416384, to1: 300, to2: 54.583616000000006, to3: 49.1252544
0000001
and finally raised timeout :P
another job rerun and still the same issue
It seems that you are talking about a few issues.
The first one is that if there is no agent present then the job is finished immediately with an error. This is as designed. The concept is that if there is no chance to execute the job then the user should know about that immediately. In your case, it is not possible to execute the job because there is no matching agent at all. First, a user would need to add such an agent. If the agent is in the system and it is just busy with another job, then the system will wait for it until it is idle.
The next issue is after adding an agent but I do not understand what is happening. It looks that the job is assigned to server
agent. I see that it timed out. Could you show what is on Steps tab?
is this expected behaviour? Shouldn't server just wait with execution for any available agent? What in case if some agents are available, network issue occurs, agents are disconnected and now planned jobs will fail instead waiting for agents to be again online? Please remind yourself how QuickBuild is working - if no agents/resources are available, build still will be in queue.