devcontainers / ci

A GitHub Action and Azure DevOps Task designed to simplify using Dev Containers (https://containers.dev) in CI/CD systems.
MIT License
303 stars 46 forks source link

Random "Is the docker daemon running?" with Docker-in-Docker Feature #192

Open metaskills opened 1 year ago

metaskills commented 1 year ago

When using the docker in docker feature it has a 20% chance to fail. I created a demo repo to show case this. It uses concurrent jobs to highlight the issue well but is not limited to this workflow style. I am seeing this random failure behavior all over certain project. HELP PLEASE!

failure

If this issue is within the CLI, then I have created an issue there in that project to track it as well:

metaskills commented 1 year ago

@Chuxel Any thoughts on this?

Chuxel commented 1 year ago

Hmmm. If you add cat /tmp/dockerd.log from your exec, that would output the startup logs. Since docker is started in the background, my bet is that things are going fast enough sometimes that the exec happens before it is fully up. Otherwise there would be errors in that file that could point to the underling issue.

Adding a sleep statement in the exec might also verify whether this is a race condition.

metaskills commented 1 year ago

Seems Docker does not start at all. Also, when this happens there is no amount of waiting I can do in the devcontainer. Docker will just not work. I tried waiting for several minutes.

cat: /tmp/dockerd.log: No such file or directory
samruddhikhandale commented 1 year ago

Looking at https://github.com/customink/dnd-demo/actions/runs/3990966349/jobs/6845376920#step:3:689, this issue sounds quite similar to https://github.com/devcontainers/features/issues/372

Looks like this issue mostly occurs in Action runners & not in a Codespace.

samruddhikhandale commented 1 year ago

@metaskills Even the other issue I pointed at, uses the runs-on: ubuntu-latest image in the workflow. @metaskills Can we change the image and see if that helps?

metaskills commented 1 year ago

Sure. I'll change it to a few other things and even see if the version of the CI helps. Will report back shortly.

metaskills commented 1 year ago

So I tested ubuntu-20.04 and after about 50 runs I've had no failures. So that is good news and gives me something to work with while we sort this out. I'll read that other issue too.

Chuxel commented 1 year ago

Very strange. I'd also be curious if running the /usr/local/share/docker-init.sh script again during your exec fixes it. We could layer in some retry if it does. But it's super odd that it's not consistent... almost like its an issue with certain Actions hosts. @samruddhikhandale - Might be worth reaching out to the actions folks to see if anything has been happening?

metaskills commented 1 year ago

if running the /usr/local/share/docker-init.sh script again during your exec

Do you mean in my runCmd?

joshaber commented 1 year ago

We've had internal reports of this as well with Debian 11.

Chuxel commented 1 year ago

if running the /usr/local/share/docker-init.sh script again during your exec

Do you mean in my runCmd?

Yes, sorry. (Under the hood its devcontainer exec.)

samruddhikhandale commented 1 year ago

Very strange. I'd also be curious if running the /usr/local/share/docker-init.sh script again during your exec fixes it. We could layer in some retry if it does. But it's super odd that it's not consistent... almost like its an issue with certain Actions hosts. @samruddhikhandale - Might be worth reaching out to the actions folks to see if anything has been happening?

Created https://github.com/actions/runner-images/issues/6980

metaskills commented 1 year ago

if running the /usr/local/share/docker-init.sh script again during your exec

@Chuxel Tried that... did not help. The message is still the same when I do this.

          runCmd: |
            /usr/local/share/docker-init.sh
            docker info
samruddhikhandale commented 1 year ago
The user "runner" used on runner-images is a member of a "docker" group, so you shouldn't expect such problems. 
However to understand the nature of the problem, could you please run the docker-in-docker task without using "devcontainers/ci@v0.2" action?
We would like to make sure that the root cause is not the action itself.

Originally posted by @Alexey-Ayupov in https://github.com/actions/runner-images/issues/6980#issuecomment-1403631708

@metaskills Would you be interested to test this hypothesis? Thanks!

metaskills commented 1 year ago

Thanks, I'm subscribed to that issue too so I replied there.