dokku / ansible-dokku

Ansible modules for installing and configuring Dokku
MIT License
175 stars 44 forks source link

Unable to start service dokku-daemon #14

Closed notapatch closed 4 years ago

notapatch commented 5 years ago

I have a simple configuration which errors on a fresh system - after it errors and you run it again you don't see the same problem. It is intermittent and works 1 in 3 to 1 in 4 times (ish).

Steps: 1) I create a new system - Ubuntu 16.04. 2) Run the below script - title: script to replicate error.

RUNNING HANDLER [dokku_bot.ansible_dokku : start dokku-daemon] *************
fatal: [me.example.com]: FAILED! => {"changed": false, "msg": "Unable to start service
 dokku-daemon: Job for dokku-daemon.service failed because the control process exited
 with error code.\nSee \"systemctl status dokku-daemon.service\" 
and \"journalctl -xe\" for details.\n"}

RUNNING HANDLER [dokku_bot.ansible_dokku : start nginx] **************************

RUNNING HANDLER [dokku_bot.ansible_dokku : reload nginx] ************************

NO MORE HOSTS LEFT ***************************************************************

Error logs

systemctl status dokku-daemon.service

 dokku-daemon.service - dokku-daemon
   Loaded: loaded (/etc/systemd/system/dokku-daemon.service; disabled; vendor preset: enabled)
   Active: failed (Result: exit-code) since Mon 2019-05-20 18:24:34 CEST; 1min 13s ago
  Process: 27134 ExecStartPost=/bin/chmod 777 ${DOKKU_SOCK_PATH} (code=exited, status=1/FAILURE)
  Process: 26067 ExecStartPost=/bin/sleep 2 (code=exited, status=0/SUCCESS)
  Process: 26066 ExecStart=/usr/bin/dokku-daemon (code=killed, signal=TERM)
  Process: 26061 ExecStartPre=/bin/bash -c DOKKU_SOCK_DIR=$(dirname ${DOKKU_SOCK_PATH}); if [ "$${DOKKU_SOCK_DIR}" != "/tmp"
  Process: 26044 ExecStartPre=/bin/bash -c DOKKU_LOCK_DIR=$(dirname ${DOKKU_LOCK_PATH}); if [ "$${DOKKU_LOCK_DIR}" != "/tmp"
 Main PID: 26066 (code=killed, signal=TERM)

May 20 18:24:33 me sudo[26470]:     root : TTY=unknown ; PWD=/ ; USER=dokku ; COMMAND=/usr/bin/dokku help --all
May 20 18:24:33 me sudo[26470]: pam_unix(sudo:session): session opened for user dokku by (uid=0)
May 20 18:24:34 me chmod[27134]: /bin/chmod: cannot access '/var/run/dokku-daemon/dokku-daemon.sock': No such file or direc
May 20 18:24:34 me systemd[1]: dokku-daemon.service: Control process exited, code=exited status=1
May 20 18:24:34 me sudo[26470]: pam_unix(sudo:session): session closed for user dokku
May 20 18:24:34 me dokku-daemon[26066]: Terminated
May 20 18:24:34 me dokku-daemon[26066]: /var/lib/dokku/plugins/enabled/redis/help-functions: line 106: echo: write error: B
May 20 18:24:34 me dokku-daemon[26066]: Terminated
May 20 18:24:34 me systemd[1]: dokku-daemon.service: Failed with result 'exit-code'.
May 20 18:24:34 me systemd[1]: Failed to start dokku-daemon.

journalctl -xe

May 20 18:26:16 me sshd[27257]: pam_unix(sshd:auth): authentication failure; logname= uid=0 euid=0 tty=ssh ruser= rhost=58.
May 20 18:26:18 me sshd[27257]: Failed password for root from 58.242.83.34 port 33883 ssh2
May 20 18:26:21 me sshd[27257]: Failed password for root from 58.242.83.34 port 33883 ssh2
May 20 18:26:24 me sshd[27257]: Failed password for root from 58.242.83.34 port 33883 ssh2
May 20 18:26:24 me sshd[27257]: Received disconnect from 58.242.83.34 port 33883:11:  [preauth]
May 20 18:26:24 me sshd[27257]: Disconnected from authenticating user root 58.242.83.34 port 33883 [preauth]
May 20 18:26:24 me sshd[27257]: PAM 2 more authentication failures; logname= uid=0 euid=0 tty=ssh ruser= rhost=58.242.83.34

^^ this is repeated 

Success logs

systemctl status dokku-daemon.service
● dokku-daemon.service - dokku-daemon
   Loaded: loaded (/etc/systemd/system/dokku-daemon.service; disabled; vendor preset: enabled)
   Active: active (running) since Mon 2019-05-20 19:03:06 CEST; 1min 1s ago
  Process: 28244 ExecStartPost=/bin/chmod 777 ${DOKKU_SOCK_PATH} (code=exited, status=0/SUCCESS)
  Process: 26284 ExecStartPost=/bin/sleep 2 (code=exited, status=0/SUCCESS)
  Process: 26278 ExecStartPre=/bin/bash -c DOKKU_SOCK_DIR=$(dirname ${DOKKU_SOCK_PATH}); if [ "$${DOKKU_SOCK_DIR}" != "/tmp"
  Process: 26263 ExecStartPre=/bin/bash -c DOKKU_LOCK_DIR=$(dirname ${DOKKU_LOCK_PATH}); if [ "$${DOKKU_LOCK_DIR}" != "/tmp"
 Main PID: 26283 (dokku-daemon)
    Tasks: 2 (limit: 4583)
   CGroup: /system.slice/dokku-daemon.service
           ├─26283 /bin/bash /usr/bin/dokku-daemon
           └─28243 socat unix-listen:/var/run/dokku-daemon/dokku-daemon.sock,fork exec:/usr/bin/dokku-daemon -c,fdin=3,fdout=

May 20 19:03:04 danna systemd[1]: Starting dokku-daemon...
May 20 19:03:04 danna sudo[26301]:     root : TTY=unknown ; PWD=/ ; USER=dokku ; COMMAND=/usr/bin/dokku version
May 20 19:03:04 danna sudo[26301]: pam_unix(sudo:session): session opened for user dokku by (uid=0)
May 20 19:03:04 danna sudo[26301]: pam_unix(sudo:session): session closed for user dokku
May 20 19:03:04 danna sudo[26687]:     root : TTY=unknown ; PWD=/ ; USER=dokku ; COMMAND=/usr/bin/dokku help --all
May 20 19:03:04 danna sudo[26687]: pam_unix(sudo:session): session opened for user dokku by (uid=0)
May 20 19:03:05 danna sudo[26687]: pam_unix(sudo:session): session closed for user dokku
May 20 19:03:06 danna systemd[1]: Started dokku-daemon.

journalctl -xe
-- Unit nginx.service has finished reloading its configuration
--
-- The result is RESULT.
May 20 19:03:48 danna sshd[28298]: Accepted publickey for root from 81.174.135.72 port 52157 ssh2: RSA SHA256:sNxE/fP83uqiOMZ
May 20 19:03:48 danna sshd[28298]: pam_unix(sshd:session): session opened for user root by (uid=0)
May 20 19:03:48 danna systemd-logind[1067]: New session 5 of user root.
-- Subject: A new session 5 has been created for user root
-- Defined-By: systemd
-- Support: http://www.ubuntu.com/support
-- Documentation: https://www.freedesktop.org/wiki/Software/systemd/multiseat
--

If you remove the plugin it succeeds as follows:

RUNNING HANDLER [dokku_bot.ansible_dokku : start dokku-daemon] **************************************************************
changed: [me.example.com]

RUNNING HANDLER [dokku_bot.ansible_dokku : start nginx] *********************************************************************
ok: [me.example.com]

RUNNING HANDLER [dokku_bot.ansible_dokku : reload nginx] ********************************************************************
changed: [me.example.com]

Also fails on these variations:

Script to replicate error

---
- hosts: all
  roles:
    - dokku_bot.ansible_dokku
  vars:
    dokku_version: '0.16.4'
    plugn_version: '0.3.2'
    ansible_user: root
    ansible_python_interpreter: /usr/bin/python3
    dokku_skip_key_file: true
    dokku_plugins:
      - name: redis
        url: https://github.com/dokku/dokku-redis.git
  pre_tasks:
    - name: Update apt cache if needed
      apt: update_cache=yes cache_valid_time=3600
      retries: 5
nemanjan00 commented 5 years ago

I am also experiencing this

yoshixmk commented 5 years ago

I had the same problem and disabled the additional installed plug-in (In my case it is postgres, elasticsearch, letsencrypt, redis). And start daemon.

sudo dokku plugin:disable postgres
sudo dokku plugin:disable elasticsearch
sudo dokku plugin:disable letsencrypt
sudo dokku plugin:disable redis

sudo systemctl start dokku-daemon.service

sudo dokku plugin:enable postgres
sudo dokku plugin:enable elasticsearch
sudo dokku plugin:enable letsencrypt
sudo dokku plugin:enable redis
nemanjan00 commented 5 years ago

Yeah, but, still, it is stopping me from using ansible to provision plugins...

Unless I use shell instead of this role..

On 10/14 09:34, yoshixmk wrote:

I had the same problem and disabled the additional installed plug-in (In my case it is postgres, elasticsearch, letsencrypt, redis). And start daemon.

sudo dokku plugin:enable postgres
sudo dokku plugin:enable elasticsearch
sudo dokku plugin:enable letsencrypt
sudo dokku plugin:enable redis

sudo systemctl start dokku-daemon.service

sudo dokku plugin:enable postgres
sudo dokku plugin:enable elasticsearch
sudo dokku plugin:enable letsencrypt
sudo dokku plugin:enable redis

-- You are receiving this because you commented. Reply to this email directly or view it on GitHub: https://github.com/dokku/ansible-dokku/issues/14#issuecomment-542031967

ltalirz commented 4 years ago

I don't seem to be experiencing this issue with the latest release of the ansible role and this use of the role

    - role: dokku_bot.ansible_dokku
      tags: dokku
      vars:
        dokku_plugins:
          - name: postgres
            url: https://github.com/dokku/dokku-postgres.git

Since we now have test builds on travis CI, I've added the postgres plugin in a branch, and the Ubuntu 16.04 build still passes (including start of the dokku daemon).

Does it work for you now? Or is the problem always intermittent, i.e. should we see failures on travis at some point?

josegonzalez commented 4 years ago

I'm going to close this. Now that there are tests, we should try and reproduce any issues in a testing environment so that we can be sure that any fix actually applies.

papaux commented 4 years ago

This was the first result when googling the issue, so here is how I solved it.

I got the exact same thing on Debian 10. It seems to be a dokku-daemon issue. This ansible role is pulling a quite old version of dokku-daemon by default.

Updating to a newer dokku-daemon version seems to resolve the issue:

dokku_daemon_version: a4cdb18d7aea7501f8fa4e82759526aee1117fd3

Two possible reasons:

  1. The systemd service file use by this ansible role has an arbitrary sleep in it:
ExecStartPost=/bin/sleep 2

Changing it to 5 seconds seems to have fixed the issue for me on my flaky VPS:

ExecStartPost=/bin/sleep 5

Upstream dokku-daemon seems to have increased it as well at some point, and the latest version reworked this to get rid of the sleep completely.

  1. There is also this issue which might be the real fix.
ltalirz commented 4 years ago

Thanks a lot @papaux for looking into this. Indeed the dokku-daemon dependency was overlooked in recent updates of the dependencies. I've just opened a PR to fix this https://github.com/dokku/ansible-dokku/pull/79

josegonzalez commented 4 years ago

@ltalirz i guess this can be closed? nvm :D