basecamp / kamal

Deploy web apps anywhere.
https://kamal-deploy.org
MIT License
9.38k stars 357 forks source link

observation: docker system dial-stdio processes do not die #837

Open wdiechmann opened 3 weeks ago

wdiechmann commented 3 weeks ago

not sure what is going on - but my observation is a slowly "degenerating system" as I keep deploying; if this is just me (not knowing to be) high as a kite on ethanol, please apologise me wasting your bandwidth 🙏

Symptoms

Either deploys fail - or takes forever - and service response is measurably below par

Diagnostics

root@ubuntu-4gb-hel1-mortimer-1:~# ps ax
...8<...
3479828 ?        Ssl    0:00 docker system dial-stdio
3479855 ?        Ssl    0:00 docker system dial-stdio
3479861 ?        Ss     0:00 sshd: root@notty
3479928 ?        Ssl    0:00 docker system dial-stdio
3479946 ?        Ssl    0:00 buildctl dial-stdio
3480065 ?        Ss     0:00 sshd: root@pts/0
3480118 ?        I      0:00 [kworker/1:1-events]
3480138 pts/0    Ss     0:00 -bash
3480958 ?        I      0:00 [kworker/u4:3-flush-8:0]
3481521 pts/0    R+     0:00 ps ax
root@ubuntu-4gb-hel1-mortimer-1:~# ps ax | grep dial-stdio | wc -l
99
root@ubuntu-4gb-hel1-mortimer-1:~# shutdown -r now
...8<...
root@ubuntu-4gb-hel1-mortimer-1:~# ps ax | grep dial-stdio | wc -l
1

Remediation

I'm barking up the kamal communicates via the npipe helped by docker system dial-stdio tree - suspecting the "remote" process not knowing when to exit so hangs around indefinitely - just a (wild) guess 🤷🏻‍♂️

Somehow signaling the process to 'go die' would perhaps solve the matter - in a perfect world not until the deploy has finished (either exit 0 or exit something) but otherwise after each command --

Reproduction

All I do is kamal env push && kamal deploy - once/twice pr 2hr slot - effectively demanding a reboot every other day

#/config/deploy.yml
    ....8<...

builder:
  remote:
    arch: arm64
    host: ssh://bob_the_builder@1.2.3.4

# Deploy to these servers.
servers:
  web:
    hosts:
      - 1.2.3.4
    options:
    ....8<...

ssh:
  user: bob_the_builder

System

it's a rental, what can I say 😉

happy user of Hetzner services

root@ubuntu-4gb-hel1-mortimer-1:~# uname -a
Linux ubuntu-4gb-hel1-mortimer-1 5.15.0-112-generic #122-Ubuntu SMP Thu May 23 07:51:32 UTC 2024 aarch64 aarch64 aarch64 GNU/Linux

and the ruby/rails env is

rails@e8d5d7728a6a:/rails$ bin/rails -v
Rails 8.0.0.alpha
rails@e8d5d7728a6a:/rails$ ruby -v
ruby 3.2.2 (2023-03-30 revision e51014f9c0) [aarch64-linux]

and finally Kamal is

√ bellis % kamal version
1.3.1
djmb commented 3 weeks ago

What are you running on ubuntu-4gb-hel1-mortimer-1? Is it used as the remote builder?

wdiechmann commented 3 weeks ago

It is - and a staging server (following the current litany out of Chicago = solid_queue, Kamal, SQLite and “1 container to rule them all”) 😉- btw: I’m in tears regarding the work put into this making so much developer happiness 🥰 So: a huge thank you to all contributors!!CheersWaltherDen 13. jun. 2024 kl. 15.48 skrev Donal McBreen @.***>: What are you running on ubuntu-4gb-hel1-mortimer-1? Is it used as the remote builder?

—Reply to this email directly, view it on GitHub, or unsubscribe.You are receiving this because you authored the thread.Message ID: @.***>

djmb commented 3 weeks ago

I think the docker system dial-stdio processes are related to the connections to your remote builder then. We have seen similar problems with ours. Looks maybe a bit like this - https://forums.docker.com/t/docker-continuously-making-unnecessary-ssh-connections-to-remote-servers/136132?

For now I'd suggest moving the remote builder to it's own server to avoid affecting your app.

wdiechmann commented 3 weeks ago

yup - that’s the ’signature’

Good advice on the “separation of concerns” 😅

Cheers, Walther

Den 13. jun. 2024 kl. 18.11 skrev Donal McBreen @.***>:

I think the docker system dial-stdio processes are related to the connections to your remote builder then. We have seen similar problems with ours. Looks maybe a bit like this - https://forums.docker.com/t/docker-continuously-making-unnecessary-ssh-connections-to-remote-servers/136132?

For now I'd suggest moving the remote builder to it's own server to avoid affecting your app.

— Reply to this email directly, view it on GitHub https://github.com/basecamp/kamal/issues/837#issuecomment-2166123081, or unsubscribe https://github.com/notifications/unsubscribe-auth/AABPFXK7XA4GWAC2FXZYSVDZHHACHAVCNFSM6AAAAABJGEAZ36VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCNRWGEZDGMBYGE. You are receiving this because you authored the thread.