bitwalker / distillery

Simplify deployments in Elixir with OTP releases!
MIT License
2.96k stars 397 forks source link

Unable to supervise indirect child with systemd #630

Closed imranismail closed 5 years ago

imranismail commented 5 years ago

I'm having trouble trying to get the app to be supervised correctly by systemd.

App started successfully but whenever it is killed via pkill beam.smp it doesn't start back up.

Currently running on:

These are the hooks configured:

systemd unit file:

[Unit]
Description=ExApp
Requires=setup-network-environment.service
After=network.target setup-network-environment.service

[Service]
Type=forking
User=deploy
Group=deploy
WorkingDirectory=/var/www/ex_app
ExecStart=/var/www/ex_app/bin/ex_app start
ExecReload=/var/www/ex_app/bin/ex_app reload_config
ExecStop=/var/www/ex_app/bin/ex_app stop
PIDFile=/var/www/ex_app/ex_app.pid
Restart=on-failure
RestartSec=5
EnvironmentFile=/etc/network-environment
EnvironmentFile=/etc/profile.d/ex_app.sh
SyslogIdentifier=ex_app
RemainAfterExit=no

[Install]
WantedBy=multi-user.target

systemd status:

● ex_app.service - ExApp
   Loaded: loaded (/etc/systemd/system/ex_app.service; enabled; vendor preset: enabled)
   Active: active (running) since Thu 2019-01-24 19:33:55 +08; 17min ago
  Process: 16654 ExecStop=/var/www/ex_app/bin/ex_app stop (code=exited, status=0/SUCCESS)
  Process: 16910 ExecStart=/var/www/ex_app/bin/ex_app start (code=exited, status=0/SUCCESS)
 Main PID: 17211 (beam.smp)
    Tasks: 24 (limit: 4704)
   CGroup: /system.slice/ex_app.service
           ├─17210 /var/www/ex_app/erts-10.2.2/bin/run_erl -daemon /var/www/ex_app/var/erl_pipes/ex_app@172.31.77.18/ /var/www/ex_app/var/log exec "/var/www/fave/bin/ex_app" "console" --
           ├─17211 /var/www/ex_app/releases/6ef93bd/ex_app.sh -- -root /var/www/ex_app -progname var/www/ex_app/releases/6ef93bd/ex_app.sh -- -home /home/deploy -- -boot_var ERTS_LIB_DIR /var/www/ex_app/lib -config /var/www/ex_app/var/sys.config -pa /var/www/ex_app/lib/ex_app-6ef93bd/consolidated -boot /var/www/ex_app/releases/6ef93bd/ex_app -name ex_app@172.31.77.18 -setcookie kgT1111%>CY66~drZRX4>z~%%o(]99.oK@m7Zuf7e{=0m66C2WT(Sm1Rt5$KF*1 -smp auto -mode embedded -user Elixir.IEx.CLI -extra --no-halt +iex -- console --
           ├─17507 erl_child_setup 1024
           ├─17526 inet_gethost 4
           └─17527 inet_gethost 4

Jan 24 19:33:39 ip-172-31-77-18 systemd[1]: Starting ExApp...
Jan 24 19:33:47 ip-172-31-77-18 ex_app[16910]: 19:33:47.030 [info] Already up
Jan 24 19:33:47 ip-172-31-77-18 systemd[1]: ex_app.service: Can't open PID file /var/www/ex_app/ex_app.pid (yet?) after start: No such file or directory
Jan 24 19:33:55 ip-172-31-77-18 systemd[1]: ex_app.service: Supervising process 17211 which is not our child. We'll most likely not notice when it exits.
Jan 24 19:33:55 ip-172-31-77-18 systemd[1]: Started ExApp.
aus70 commented 5 years ago

I had the same issue: I removed the PIDFile=... line and it seems to work well. If I manually stop the process (bin/my_app stop), systemd restarts it, so it looks like it's actually supervising the app (disclaimer: I'm no systemd expert)

OvermindDL1 commented 5 years ago

What's managing the PID file, is it writing out the PID of the pre-forked process or something instead of the post-forked process?

aus70 commented 5 years ago

if Type=forking and PIDFile is not set, the option GuessMainPID defaults to yes and systemd should refer to the PID of the main process of the daemon after start-up. I hope that answers @OvermindDL1 's question.

OvermindDL1 commented 5 years ago

I know about when PIDFile is not set, but I was wondering why it was not working on when the PIDFile is set, because if that's not working it sounds like the wrong PID is being written.

aus70 commented 5 years ago

The PID that gets written by distillery belongs to the first child (17211) of the daemon process (17210), so I guess systemd expects 17210 instead of 17211.

bitwalker commented 5 years ago

You should not use start with systemd, as you've found, it is not really designed for that use and doesn't work reliably because of that. Instead, use foreground and let systemd manage the process normally, it works very well in my experience.