concourse / concourse-docker

Offical concourse/concourse Docker image.
Apache License 2.0
241 stars 153 forks source link

Worker fails to start on newer version of docker #74

Closed pythys closed 2 years ago

pythys commented 2 years ago

On a fresh clean repository I do the following:

  1. ./keys/generate
  2. docker-compose up -d --build
  3. I get the below error from the worker and it crashes

My environment:

{"timestamp":"2021-09-18T12:00:59.488503476Z","level":"info","source":"baggageclaim","message":"baggageclaim.using-driver","data":{"driver":"overlay"}}
{"timestamp":"2021-09-18T12:00:59.489308705Z","level":"info","source":"baggageclaim","message":"baggageclaim.listening","data":{"addr":"127.0.0.1:7788"}}
{"timestamp":"2021-09-18T12:00:59.489768866Z","level":"error","source":"worker","message":"worker.beacon-runner.beacon.failed-to-connect-to-tsa","data":{"error":"dial tcp 172.26.0.4:2222: connect: connection refused","session":"4.1"}}
{"timestamp":"2021-09-18T12:00:59.489798847Z","level":"error","source":"worker","message":"worker.beacon-runner.beacon.dial.failed-to-connect-to-any-tsa","data":{"error":"all worker SSH gateways unreachable","session":"4.1.1"}}
{"timestamp":"2021-09-18T12:00:59.489811980Z","level":"error","source":"worker","message":"worker.beacon-runner.beacon.failed-to-dial","data":{"error":"all worker SSH gateways unreachable","session":"4.1"}}
{"timestamp":"2021-09-18T12:00:59.489832501Z","level":"error","source":"worker","message":"worker.beacon-runner.beacon.exited-with-error","data":{"error":"all worker SSH gateways unreachable","session":"4.1"}}
{"timestamp":"2021-09-18T12:00:59.489854216Z","level":"error","source":"worker","message":"worker.beacon-runner.failed","data":{"error":"all worker SSH gateways unreachable","session":"4"}}
{"timestamp":"2021-09-18T12:01:00.463191900Z","level":"info","source":"guardian","message":"guardian.no-port-pool-state-to-recover-starting-clean","data":{}}
{"timestamp":"2021-09-18T12:01:00.463716950Z","level":"info","source":"guardian","message":"guardian.metrics-notifier.starting","data":{"interval":"1m0s","session":"5"}}
{"timestamp":"2021-09-18T12:01:00.463739568Z","level":"info","source":"guardian","message":"guardian.start.starting","data":{"session":"6"}}
{"timestamp":"2021-09-18T12:01:00.463781384Z","level":"info","source":"guardian","message":"guardian.metrics-notifier.started","data":{"interval":"1m0s","session":"5","time":"2021-09-18T12:01:00.46377979Z"}}
{"timestamp":"2021-09-18T12:01:00.464889552Z","level":"info","source":"guardian","message":"guardian.cgroups-tmpfs-already-mounted","data":{"path":"/sys/fs/cgroup"}}
{"timestamp":"2021-09-18T12:01:00.464948854Z","level":"info","source":"guardian","message":"guardian.mount-cgroup.started","data":{"path":"/sys/fs/cgroup/cpuset","session":"7","subsystem":"cpuset"}}
{"timestamp":"2021-09-18T12:01:00.465058050Z","level":"info","source":"guardian","message":"guardian.start.completed","data":{"session":"6"}}
{"timestamp":"2021-09-18T12:01:00.465072541Z","level":"error","source":"guardian","message":"guardian.starting-guardian-backend","data":{"error":"bulk starter: mounting subsystem 'cpuset' in '/sys/fs/cgroup/cpuset': operation not permitted"}}
bulk starter: mounting subsystem 'cpuset' in '/sys/fs/cgroup/cpuset': operation not permitted
bulk starter: mounting subsystem 'cpuset' in '/sys/fs/cgroup/cpuset': operation not permitted
{"timestamp":"2021-09-18T12:01:00.469955305Z","level":"error","source":"worker","message":"worker.garden.gdn-runner.logging-runner-exited","data":{"error":"exit status 1","session":"1.2"}}
{"timestamp":"2021-09-18T12:01:00.470019538Z","level":"error","source":"worker","message":"worker.garden-runner.logging-runner-exited","data":{"error":"Exit trace for group:\ngdn exited with error: exit status 1\n","session":"8"}}
{"timestamp":"2021-09-18T12:01:00.470077670Z","level":"info","source":"worker","message":"worker.container-sweeper.sweep-cancelled-by-signal","data":{"session":"6","signal":2}}
{"timestamp":"2021-09-18T12:01:00.470115898Z","level":"info","source":"worker","message":"worker.baggageclaim-runner.logging-runner-exited","data":{"session":"9"}}
{"timestamp":"2021-09-18T12:01:00.470078780Z","level":"info","source":"worker","message":"worker.volume-sweeper.sweep-cancelled-by-signal","data":{"session":"7","signal":2}}
{"timestamp":"2021-09-18T12:01:00.470204966Z","level":"info","source":"worker","message":"worker.volume-sweeper.logging-runner-exited","data":{"session":"14"}}
{"timestamp":"2021-09-18T12:01:00.470091895Z","level":"info","source":"worker","message":"worker.debug-runner.logging-runner-exited","data":{"session":"10"}}
{"timestamp":"2021-09-18T12:01:00.470129680Z","level":"info","source":"worker","message":"worker.container-sweeper.logging-runner-exited","data":{"session":"13"}}
{"timestamp":"2021-09-18T12:01:00.470134031Z","level":"info","source":"worker","message":"worker.healthcheck-runner.logging-runner-exited","data":{"session":"11"}}
{"timestamp":"2021-09-18T12:01:04.490229991Z","level":"info","source":"worker","message":"worker.beacon-runner.restarting","data":{"session":"4"}}
{"timestamp":"2021-09-18T12:01:04.490835796Z","level":"error","source":"worker","message":"worker.beacon-runner.beacon.failed-to-connect-to-tsa","data":{"error":"dial tcp 172.26.0.4:2222: connect: connection refused","session":"4.1"}}
{"timestamp":"2021-09-18T12:01:04.490865694Z","level":"error","source":"worker","message":"worker.beacon-runner.beacon.dial.failed-to-connect-to-any-tsa","data":{"error":"all worker SSH gateways unreachable","session":"4.1.2"}}
{"timestamp":"2021-09-18T12:01:04.490880940Z","level":"error","source":"worker","message":"worker.beacon-runner.beacon.failed-to-dial","data":{"error":"all worker SSH gateways unreachable","session":"4.1"}}
{"timestamp":"2021-09-18T12:01:04.490906496Z","level":"info","source":"worker","message":"worker.beacon-runner.beacon.signal.signalled","data":{"session":"4.1.3"}}
{"timestamp":"2021-09-18T12:01:04.490930614Z","level":"info","source":"worker","message":"worker.beacon-runner.logging-runner-exited","data":{"session":"12"}}
error: Exit trace for group:
garden exited with error: Exit trace for group:
gdn exited with error: exit status 1

baggageclaim exited with nil
volume-sweeper exited with nil
debug exited with nil
container-sweeper exited with nil
healthcheck exited with nil
beacon exited with nil
taylorsilva commented 2 years ago
bulk starter: mounting subsystem 'cpuset' in '/sys/fs/cgroup/cpuset': operation not permitted

means you probably have cgroups v2 enabled and are using the default container runtime garden/guardian. You can do one of the following:

  1. Disable cgroups v2 and go back to v1
  2. Switch the worker's container runtime to containerd which supports cgroups v2 CONCOURSE_RUNTIME=containerd
pythys commented 2 years ago

Excellent! The CONCOURSE_RUNTIME=containerd did the trick for me and it's much easier and cleaner than downgrading cgroups. I hope we eventually won't need this workaround.

Thank you so much!

taylorsilva commented 2 years ago

containerd will be the default runtime in the future for Concourse. The garden/guardian team isn't doing feature work afaik.