Open steffen-wilke opened 5 years ago
I was able not to reproduce this reliably:
Note that all jobs in the steps-to-reproduce request a node of the same label.
This is somewhat related to #74 and https://github.com/jenkinsci/docker-plugin/issues/427
Additional Note: If such an incident occurs, it is tracked by the Cloud Statistics as "stuck" in the Provisioning
phase.
Examples: (Note the entries below the 2nd)
Looking at the docker host system (via docker container ls -a
), there is always a container in the Created
state for these cases:
To me this issue sounds very similar to: https://github.com/jenkinsci/docker-plugin/issues/594
@KostyaSha Do you have any thoughts on this? Would very much appreciate your opinion here since I'm currently a bit puzzled on what could be the solution to this.
so they were created but didn't spin and connect?
so they were created but didn't spin and connect?
Exactly.
We are having the same problem. The problem for us started when we updated the ssh-slaves-plugin. At first we thought it was because of https://issues.jenkins-ci.org/browse/JENKINS-58340 but it still doesn't work. Perhapse these issues are related?
We're using the
yad-plugin
to provide on-the-fly docker build containers on a single Docker Cloud (Windows Server 2019). In general, this works just fine but recently I've observed an issue that occurs mostly when multiple jobs are triggered at the same time. This sometimes happens for us when we trigger multiple (2) down-stream jobs after a successful run of a parent job but also when an SCM change triggers multiple jobs at once. The issue is that some containers for triggered jobs are created but never connected as Jenkins slaves.What happens:
docker container ls -a
lists a bunch of new containers with statusCreated
Running
and are later on properly terminated and removed after the build has been carried out).Waiting for next available executor on '{LABEL}'
docker container ls -a
on the Docker host system once again reveals that there are still some remaining (newly created) containers with aCreated
status.Waiting for next available executor on '{LABEL}'
state until another job with that LABEL gets triggered. Then they will "steal" the agent for that new job and the new job will remain in the "Waiting" state. They change their state at some point toAll nodes of label '{LABEL}' are offline.
but don't trigger a container initialization again. Only after the configured timeout (10 mins in our case) the plugin seems to request another container for the stuck job.I think there might be a general problem with multiple jobs requesting a new build container at (roughly) the same time. This only happens sporadically though. Most of the time triggering multiple jobs at the same time works just fine.