Kraken-CI / kraken

Kraken CI is a continuous integration and testing system.
https://kraken.ci/
Apache License 2.0
127 stars 14 forks source link

Run of default Demo project fails when there is no agents #101

Open kaka2991 opened 3 years ago

kaka2991 commented 3 years ago
image

is this expected behaviour? Shouldn't server just wait with execution for any available agent? What in case if some agents are available, network issue occurs, agents are disconnected and now planned jobs will fail instead waiting for agents to be again online? Please remind yourself how QuickBuild is working - if no agents/resources are available, build still will be in queue.

kaka2991 commented 3 years ago

Definitely sth wrong, because after click on "Rerun all" now job is in queue (still no agents):

image
kaka2991 commented 3 years ago

Damn, after authorising the server as agent, job is assigned:

image

but the execution popped in the infinite loop. Server/agent is using 100% CPU and continuously throws the following entries (docker compose logs):

agent_1             | 2021-03-15 21:50:39,441 INFO p:    7    agent:177   received job: {'id': 5, 'created': '2021-03-15T21:43:57Z', 'deleted': None, 'started': '2021-03-15T21:46:37Z', 'finished': None, 'completed': None, 'duration': '4m 1s', 'name': 'hello world', 'state': 3, 'completio
agent_1             | 2021-03-15 21:50:39,443 INFO p:    7    agent:61    job now: 2021-03-15 21:50:39, deadline: 2021-03-15 21:51:28, time: 49s
agent_1             | 2021-03-15 21:50:39,444 INFO p:    7   jobber:381   started job in /tmp/kk/jobs/5
agent_1             | 2021-03-15 21:50:39,445 INFO p:    7   jobber:425   completed job 5 with status None
server_1            | 2021-03-15 21:50:39,448 INFO p:   10  backend:499   request data: {'address': 'server', 'msg': 'get-job'}
server_1            | 2021-03-15 21:50:39,457 INFO p:   10  backend:64    hello world-<System 1>-1: now: 2021-03-15 21:50:39.457847, slip:0:04:05.310865, to1: 300, to2: 54.68913499999999, to3: 49.22022149
9999994
server_1            | 2021-03-15 21:50:39,469 INFO p:   10  backend:547   sending response: {'job': {'id': 5, 'created': '2021-03-15T21:43:57Z', 'deleted': None, 'started': '2021-03-15T21:46:37Z', 'finish
ed': None, 'completed': None, 'duration': '4m 1s', 'name': 'hello world', 'state': 3, 'c
agent_1             | 2021-03-15 21:50:39,471 INFO p:    7    agent:177   received job: {'id': 5, 'created': '2021-03-15T21:43:57Z', 'deleted': None, 'started': '2021-03-15T21:46:37Z', 'finished': None, '
completed': None, 'duration': '4m 1s', 'name': 'hello world', 'state': 3, 'completio
agent_1             | 2021-03-15 21:50:39,473 INFO p:    7    agent:61    job now: 2021-03-15 21:50:39, deadline: 2021-03-15 21:51:28, time: 49s
agent_1             | 2021-03-15 21:50:39,474 INFO p:    7   jobber:381   started job in /tmp/kk/jobs/5
agent_1             | 2021-03-15 21:50:39,475 INFO p:    7   jobber:425   completed job 5 with status None
server_1            | 2021-03-15 21:50:39,478 INFO p:   11  backend:499   request data: {'address': 'server', 'msg': 'get-job'}
server_1            | 2021-03-15 21:50:39,486 INFO p:   11  backend:64    hello world-<System 1>-1: now: 2021-03-15 21:50:39.486334, slip:0:04:05.339352, to1: 300, to2: 54.66064800000001, to3: 49.19458320
000001
server_1            | 2021-03-15 21:50:39,494 INFO p:   11  backend:547   sending response: {'job': {'id': 5, 'created': '2021-03-15T21:43:57Z', 'deleted': None, 'started': '2021-03-15T21:46:37Z', 'finish
ed': None, 'completed': None, 'duration': '4m 1s', 'name': 'hello world', 'state': 3, 'c
agent_1             | 2021-03-15 21:50:39,496 INFO p:    7    agent:177   received job: {'id': 5, 'created': '2021-03-15T21:43:57Z', 'deleted': None, 'started': '2021-03-15T21:46:37Z', 'finished': None, '
completed': None, 'duration': '4m 1s', 'name': 'hello world', 'state': 3, 'completio
agent_1             | 2021-03-15 21:50:39,496 INFO p:    7    agent:61    job now: 2021-03-15 21:50:39, deadline: 2021-03-15 21:51:28, time: 49s
agent_1             | 2021-03-15 21:50:39,498 INFO p:    7   jobber:381   started job in /tmp/kk/jobs/5
agent_1             | 2021-03-15 21:50:39,499 INFO p:    7   jobber:425   completed job 5 with status None
server_1            | 2021-03-15 21:50:39,502 INFO p:   10  backend:499   request data: {'address': 'server', 'msg': 'get-job'}
server_1            | 2021-03-15 21:50:39,511 INFO p:   10  backend:64    hello world-<System 1>-1: now: 2021-03-15 21:50:39.511499, slip:0:04:05.364517, to1: 300, to2: 54.635482999999994, to3: 49.171934699999994
server_1            | 2021-03-15 21:50:39,520 INFO p:   10  backend:547   sending response: {'job': {'id': 5, 'created': '2021-03-15T21:43:57Z', 'deleted': None, 'started': '2021-03-15T21:46:37Z', 'finished': None, 'completed': None, 'duration': '4m 1s', 'name': 'hello world', 'state': 3, 'c
agent_1             | 2021-03-15 21:50:39,521 INFO p:    7    agent:177   received job: {'id': 5, 'created': '2021-03-15T21:43:57Z', 'deleted': None, 'started': '2021-03-15T21:46:37Z', 'finished': None, 'completed': None, 'duration': '4m 1s', 'name': 'hello world', 'state': 3, 'completio
agent_1             | 2021-03-15 21:50:39,522 INFO p:    7    agent:61    job now: 2021-03-15 21:50:39, deadline: 2021-03-15 21:51:28, time: 49s
agent_1             | 2021-03-15 21:50:39,524 INFO p:    7   jobber:381   started job in /tmp/kk/jobs/5
agent_1             | 2021-03-15 21:50:39,525 INFO p:    7   jobber:425   completed job 5 with status None
server_1            | 2021-03-15 21:50:39,528 INFO p:   11  backend:499   request data: {'address': 'server', 'msg': 'get-job'}
server_1            | 2021-03-15 21:50:39,536 INFO p:   11  backend:64    hello world-<System 1>-1: now: 2021-03-15 21:50:39.536827, slip:0:04:05.389845, to1: 300, to2: 54.61015499999999, to3: 49.1491395
server_1            | 2021-03-15 21:50:39,547 INFO p:   11  backend:547   sending response: {'job': {'id': 5, 'created': '2021-03-15T21:43:57Z', 'deleted': None, 'started': '2021-03-15T21:46:37Z', 'finished': None, 'completed': None, 'duration': '4m 1s', 'name': 'hello world', 'state': 3, 'c
agent_1             | 2021-03-15 21:50:39,548 INFO p:    7    agent:177   received job: {'id': 5, 'created': '2021-03-15T21:43:57Z', 'deleted': None, 'started': '2021-03-15T21:46:37Z', 'finished': None, 'completed': None, 'duration': '4m 1s', 'name': 'hello world', 'state': 3, 'completio
agent_1             | 2021-03-15 21:50:39,549 INFO p:    7    agent:61    job now: 2021-03-15 21:50:39, deadline: 2021-03-15 21:51:28, time: 49s
agent_1             | 2021-03-15 21:50:39,551 INFO p:    7   jobber:381   started job in /tmp/kk/jobs/5
agent_1             | 2021-03-15 21:50:39,552 INFO p:    7   jobber:425   completed job 5 with status None
server_1            | 2021-03-15 21:50:39,555 INFO p:   10  backend:499   request data: {'address': 'server', 'msg': 'get-job'}
server_1            | 2021-03-15 21:50:39,563 INFO p:   10  backend:64    hello world-<System 1>-1: now: 2021-03-15 21:50:39.563366, slip:0:04:05.416384, to1: 300, to2: 54.583616000000006, to3: 49.1252544
0000001  
kaka2991 commented 3 years ago

and finally raised timeout :P

image
kaka2991 commented 3 years ago

another job rerun and still the same issue

godfryd commented 3 years ago

It seems that you are talking about a few issues.

The first one is that if there is no agent present then the job is finished immediately with an error. This is as designed. The concept is that if there is no chance to execute the job then the user should know about that immediately. In your case, it is not possible to execute the job because there is no matching agent at all. First, a user would need to add such an agent. If the agent is in the system and it is just busy with another job, then the system will wait for it until it is idle.

The next issue is after adding an agent but I do not understand what is happening. It looks that the job is assigned to server agent. I see that it timed out. Could you show what is on Steps tab?