Open metaskills opened 1 year ago
@Chuxel Any thoughts on this?
Hmmm. If you add cat /tmp/dockerd.log
from your exec, that would output the startup logs. Since docker is started in the background, my bet is that things are going fast enough sometimes that the exec happens before it is fully up. Otherwise there would be errors in that file that could point to the underling issue.
Adding a sleep statement in the exec might also verify whether this is a race condition.
Seems Docker does not start at all. Also, when this happens there is no amount of waiting I can do in the devcontainer. Docker will just not work. I tried waiting for several minutes.
cat: /tmp/dockerd.log: No such file or directory
Looking at https://github.com/customink/dnd-demo/actions/runs/3990966349/jobs/6845376920#step:3:689, this issue sounds quite similar to https://github.com/devcontainers/features/issues/372
Looks like this issue mostly occurs in Action runners & not in a Codespace.
@metaskills Even the other issue I pointed at, uses the runs-on: ubuntu-latest
image in the workflow. @metaskills Can we change the image and see if that helps?
Sure. I'll change it to a few other things and even see if the version of the CI helps. Will report back shortly.
So I tested ubuntu-20.04
and after about 50 runs I've had no failures. So that is good news and gives me something to work with while we sort this out. I'll read that other issue too.
Very strange. I'd also be curious if running the /usr/local/share/docker-init.sh
script again during your exec fixes it. We could layer in some retry if it does. But it's super odd that it's not consistent... almost like its an issue with certain Actions hosts. @samruddhikhandale - Might be worth reaching out to the actions folks to see if anything has been happening?
if running the /usr/local/share/docker-init.sh script again during your exec
Do you mean in my runCmd
?
We've had internal reports of this as well with Debian 11.
if running the /usr/local/share/docker-init.sh script again during your exec
Do you mean in my
runCmd
?
Yes, sorry. (Under the hood its devcontainer exec
.)
Very strange. I'd also be curious if running the
/usr/local/share/docker-init.sh
script again during your exec fixes it. We could layer in some retry if it does. But it's super odd that it's not consistent... almost like its an issue with certain Actions hosts. @samruddhikhandale - Might be worth reaching out to the actions folks to see if anything has been happening?
Created https://github.com/actions/runner-images/issues/6980
if running the /usr/local/share/docker-init.sh script again during your exec
@Chuxel Tried that... did not help. The message is still the same when I do this.
runCmd: |
/usr/local/share/docker-init.sh
docker info
The user "runner" used on runner-images is a member of a "docker" group, so you shouldn't expect such problems.
However to understand the nature of the problem, could you please run the docker-in-docker task without using "devcontainers/ci@v0.2" action?
We would like to make sure that the root cause is not the action itself.
Originally posted by @Alexey-Ayupov in https://github.com/actions/runner-images/issues/6980#issuecomment-1403631708
@metaskills Would you be interested to test this hypothesis? Thanks!
Thanks, I'm subscribed to that issue too so I replied there.
When using the docker in docker feature it has a 20% chance to fail. I created a demo repo to show case this. It uses concurrent jobs to highlight the issue well but is not limited to this workflow style. I am seeing this random failure behavior all over certain project. HELP PLEASE!
If this issue is within the CLI, then I have created an issue there in that project to track it as well: