geerlingguy / ansible-role-docker

Ansible Role - Docker
https://galaxy.ansible.com/geerlingguy/docker/
MIT License
1.81k stars 853 forks source link

Setting hosts in docker_daemon_opts variable on a debian 11 host breaks docker #411

Closed zloveless closed 5 months ago

zloveless commented 1 year ago

I added a docker directive meant to configure docker to listen on a TCP port (for local development). Docker didn't like that, and complained that I had specified conflicting parameters. The reason is that the systemd unit file for docker specified -H fd:// and --containerd=... in the startup params.

The fix was to add a systemd override config file to modify the startup params which just launches /usr/bin/dockerd with no args. I added "containerd" as another docker_daemon_opts entry with the original value from the command line as a workaround.

I'm documenting this here for the future in case someone else runs into this issue. I don't know if the role can fix this directly, without some weird ansible conditional logic lol. So this is probably a low priority.


Current behavior:

Docker doesn't come back up when certain arguments are added to docker_daemon_opts

Expected behavior:

The ansible role has some way to adapt if someone wants to specify an alternative listen host for the docker socket, it would add a systemd override file as described above.

github-actions[bot] commented 1 year ago

This issue has been marked 'stale' due to lack of recent activity. If there is no further activity, the issue will be closed in another 30 days. Thank you for your contribution!

Please read this blog post to see the reasons why I mark issues as stale.

sepek commented 1 year ago

For me that's still the case. My workaround is specifying all configuration paramaters in the daemon.json via:

docker_daemon_options:
  containerd: "/run/containerd/containerd.sock"
  hosts: [
    "tcp://127.0.0.1:2375",
    "unix:///var/run/docker.sock"
  ]

and the following systemd override via a pre_tasks for now:

pre_tasks:
  - name: Docker - override systemd service unit
    copy:
      dest: '/etc/systemd/system/docker.service.d/override.conf'
      content: |
        [Service]
        ExecStart=
        ExecStart=/usr/bin/dockerd
epou commented 11 months ago

thanks @sepek for your solution. I've added some more lines into yours to make a little bit more generic and avoid some errors

pre_tasks: # see: https://github.com/geerlingguy/ansible-role-docker/issues/411
    - name: Docker - make sure systemd dir exists
      ansible.builtin.file:
        path: /etc/systemd/system/docker.service.d
        state: directory
        mode: '0755'
        owner: root
        group: root
    - name: Docker - override systemd service unit
      copy:
        dest: '/etc/systemd/system/docker.service.d/override.conf'
        content: |
          [Service]
          ExecStart=
          ExecStart=/usr/bin/dockerd
        mode: '0755'
        owner: root
        group: root
      register: systemd_changed
    - name: Just force systemd to reread configs
      ansible.builtin.systemd:
        daemon_reload: true
      when: systemd_changed.changed

hope it helps

sepek commented 11 months ago

@epou Nice thanks! I guess you would need a restart instead of a reload of Docker. Else I suspect the process would still be running with the default parameters.

epou commented 11 months ago

This is what I got without the systemctl daemon-reload

>>> sudo systemctl status docker.service
Warning: The unit file, source configuration file or drop-ins of docker.service changed on disk. Run 'systemctl daemon-reload' to reload units.
● docker.service - Docker Application Container Engine
     Loaded: loaded (/lib/systemd/system/docker.service; enabled; vendor preset: enabled)
     Active: failed (Result: exit-code) since Thu 2023-10-05 13:37:10 UTC; 13s ago
TriggeredBy: ● docker.socket
       Docs: https://docs.docker.com
    Process: 3095311 ExecStart=/usr/bin/dockerd -H fd:// --containerd=/run/containerd/containerd.sock (code=exited, status=1/FAILURE)
   Main PID: 3095311 (code=exited, status=1/FAILURE)
        CPU: 32ms

Oct 05 13:37:08 hostname systemd[1]: docker.service: Main process exited, code=exited, status=1/FAILURE
Oct 05 13:37:08 hostname systemd[1]: docker.service: Failed with result 'exit-code'.
Oct 05 13:37:08 hostname systemd[1]: Failed to start Docker Application Container Engine.
Oct 05 13:37:10 hostname systemd[1]: docker.service: Scheduled restart job, restart counter is at 3.
Oct 05 13:37:10 hostname systemd[1]: Stopped Docker Application Container Engine.
Oct 05 13:37:10 hostname systemd[1]: docker.service: Start request repeated too quickly.
Oct 05 13:37:10 hostname systemd[1]: docker.service: Failed with result 'exit-code'.
Oct 05 13:37:10 hostname systemd[1]: Failed to start Docker Application Container Engine.

So I saw the -> Run 'systemctl daemon-reload' to reload units.

So then:

>>> sudo systemctl daemon-reload
>>> sudo systemctl start docker.service
>>> sudo systemctl status docker.service
● docker.service - Docker Application Container Engine
     Loaded: loaded (/lib/systemd/system/docker.service; enabled; vendor preset: enabled)
    Drop-In: /etc/systemd/system/docker.service.d
             └─override.conf
     Active: active (running) since Thu 2023-10-05 13:38:00 UTC; 1s ago
TriggeredBy: ● docker.socket
       Docs: https://docs.docker.com
   Main PID: 3095351 (dockerd)
      Tasks: 10
     Memory: 34.0M
        CPU: 203ms
     CGroup: /system.slice/docker.service
             └─3095351 /usr/bin/dockerd

It can be seen that the override is done well.

sepek commented 11 months ago

Stupid me confussed systemctl reload with systemctl daemon-reload and got confussed by the fact that there is no restart of docker itself. Thanks for correcting me!

github-actions[bot] commented 7 months ago

This issue has been marked 'stale' due to lack of recent activity. If there is no further activity, the issue will be closed in another 30 days. Thank you for your contribution!

Please read this blog post to see the reasons why I mark issues as stale.

github-actions[bot] commented 5 months ago

This issue has been closed due to inactivity. If you feel this is in error, please reopen the issue or file a new issue with the relevant details.