Closed ahmetb closed 7 years ago
I think it comes down to a docker issue, I reported it here: https://github.com/docker/docker/issues/13885#issuecomment-270007460
I see a ton of netlink-related goroutines in docker daemon dump:
goroutine 1149 [chan send, 1070 minutes]:
github.com/vishvananda/netlink.LinkSubscribe.func2(0xc821159d40, 0xc820fa4c60)
/build/amd64-usr/var/tmp/portage/app-emulation/docker-1.11.2-r5/work/docker-1.11.2/vendor/src/github.com/vishvananda/netlink/link_linux.go:898 +0x2de
created by github.com/vishvananda/netlink.LinkSubscribe
/build/amd64-usr/var/tmp/portage/app-emulation/docker-1.11.2-r5/work/docker-1.11.2/vendor/src/github.com/vishvananda/netlink/link_linux.go:901 +0x107
goroutine 442 [chan send, 1129 minutes]:
github.com/vishvananda/netlink.LinkSubscribe.func2(0xc8211eb380, 0xc821095920)
/build/amd64-usr/var/tmp/portage/app-emulation/docker-1.11.2-r5/work/docker-1.11.2/vendor/src/github.com/vishvananda/netlink/link_linux.go:898 +0x2de
created by github.com/vishvananda/netlink.LinkSubscribe
/build/amd64-usr/var/tmp/portage/app-emulation/docker-1.11.2-r5/work/docker-1.11.2/vendor/src/github.com/vishvananda/netlink/link_linux.go:901 +0x107
I think that's the issue.
It looks like 1235.4.0
stable fixed this.
Issue Report
Bug
CoreOS Version
Environment
DigitalOcean VM.
Expected Behavior
I have about 15 systemd timers (and their service units) that all all start the same container with different parameters. It runs a Go program, which would run for a few seconds and exit. I start the containers with
docker run --rm
in the .service unit file.I'm expecting to use systemd timers to be able to run cron-like jobs on the docker engine. Actual behavior below.
Reproduction Steps (i.e. my setup)
It's a bit hard for me to give a working repro but I'll go over the use case. (I can provide access to a live repro machine if needed). I have:
Example .service:
Example .timer:
Both the timer and the service are registered/enabled in systemd. It works fine, until it stops working fine after a few runs and freezes the docker API.
Actual Behavior
docker ps
(and other commands calling the API) stops responding. It just freezes. When I prependstrace
it gets frozen on the docker socket indefinitely:docker.service
has logs like this in the journal:docker run --rm ...
, some of them are working fine, however, some of them are just getting stuck past their due (possibly correlated with docker API freezing, when one syptom happens, the other one is often there, too). And this happens very often, like multiple times a day. See the "ago" values in "LEFT" column below:ps aux | grep docker
, I see the list of commands run by my frozen timers:htop
also shows the frozen /usr/bin/docker (client) processes. Also, my docker program's entrypoint does not appear in the top output, so I'm guessing my app is not responsible for the issue.