KostyaSha / yet-another-docker-plugin

Jenkins Yet Another Docker Plugin
https://plugins.jenkins.io/yet-another-docker-plugin
MIT License
83 stars 48 forks source link

Jenkins Slave is not created / container is not started although it's created (stuck in "provisioning" forever) #268

Open steffen-wilke opened 5 years ago

steffen-wilke commented 5 years ago

We're using the yad-plugin to provide on-the-fly docker build containers on a single Docker Cloud (Windows Server 2019). In general, this works just fine but recently I've observed an issue that occurs mostly when multiple jobs are triggered at the same time. This sometimes happens for us when we trigger multiple (2) down-stream jobs after a successful run of a parent job but also when an SCM change triggers multiple jobs at once. The issue is that some containers for triggered jobs are created but never connected as Jenkins slaves.

What happens:

I think there might be a general problem with multiple jobs requesting a new build container at (roughly) the same time. This only happens sporadically though. Most of the time triggering multiple jobs at the same time works just fine.

steffen-wilke commented 5 years ago

I was able not to reproduce this reliably:

Note that all jobs in the steps-to-reproduce request a node of the same label.

  1. Trigger a job that requires multiple nodes or trigger multiple jobs at the same time -> we have an SCM trigger that triggers 2 jobs at a time (Job A and Job B); multiple downstream jobs would also cause the same effect
  2. While the provisioning is in process: Start another job that requires a new node (Job C)
  3. Job C will try to "steal" a node that was originally created for Job A or B
  4. Depending on which node was "stolen" either Job A or B get stuck
steffen-wilke commented 5 years ago

This is somewhat related to #74 and https://github.com/jenkinsci/docker-plugin/issues/427

steffen-wilke commented 5 years ago

Additional Note: If such an incident occurs, it is tracked by the Cloud Statistics as "stuck" in the Provisioning phase.

Examples: (Note the entries below the 2nd)

image

Looking at the docker host system (via docker container ls -a), there is always a container in the Created state for these cases: image

steffen-wilke commented 5 years ago

To me this issue sounds very similar to: https://github.com/jenkinsci/docker-plugin/issues/594

@KostyaSha Do you have any thoughts on this? Would very much appreciate your opinion here since I'm currently a bit puzzled on what could be the solution to this.

KostyaSha commented 5 years ago

so they were created but didn't spin and connect?

steffen-wilke commented 5 years ago

so they were created but didn't spin and connect?

Exactly.

rdevries commented 5 years ago

We are having the same problem. The problem for us started when we updated the ssh-slaves-plugin. At first we thought it was because of https://issues.jenkins-ci.org/browse/JENKINS-58340 but it still doesn't work. Perhapse these issues are related?