gdraheim / docker-systemctl-replacement

docker systemctl replacement - allows to deploy to systemd-controlled containers without starting an actual systemd daemon (e.g. centos7, ubuntu16)
European Union Public License 1.2
1.39k stars 399 forks source link

Misleading error message interaction between ExecStart= and ExecStartPost= #160

Open PenelopeFudd opened 1 year ago

PenelopeFudd commented 1 year ago

Hi;

We have an Ansible deployment script that installs this service file:

[Unit]
Description=rabbitmq-server - RabbitMQ broker
After=network.target epmd@0.0.0.0.socket
Wants=network.target epmd@0.0.0.0.socket

[Service]
Type=notify
User=rabbitmq
Group=rabbitmq
UMask=0027
NotifyAccess=all
TimeoutStartSec=3600
LimitNOFILE=32768
Restart=on-failure
RestartSec=10
WorkingDirectory=/var/lib/rabbitmq
ExecStart=/usr/lib/rabbitmq/bin/rabbitmq-server
ExecStartPost=-+/home/application/bin/python3 /usr/local/bin/rabbitmq_detect_msg_store_corruption.py
ExecStop=/usr/lib/rabbitmq/bin/rabbitmqctl shutdown
SuccessExitStatus=69

[Install]
WantedBy=multi-user.target

When we start the service, we get this:

$ sudo systemctl start service rabbitmq-server

Unable to start service rabbitmq-server: ERROR:systemctl: rabbitmq-server.service: Exec command does not exist: (ExecStartPost) /home/application/backend/bin/python3

$ echo $?
1

The error message turned out to be a red herring. Neither the Ansible script nor the service file has been changed in over a year, and the error message has apparently been printed all this time without returning an error code.

The true error turns out to be that we changed a password in rabbitmq's configuration file, and we failed to url-escape it. When ExecStart runs, the server writes an error to a random log file and exits with a non-zero return code.

It would be nice if systemctl had printed

Unable to start service rabbitmq-server: ERROR:systemctl: rabbitmq-server.service: ExecStart command exited with an ExitStatus of 1: (ExecStart) /usr/lib/rabbitmq/bin/rabbitmq-server

Thanks!

gdraheim commented 1 year ago

Sadly this is impossible as the docker-systemctl-replacement is not a server that can watch its children. It can not see the returncode of the ExecStart process - it will only detect a "failed" service when that Pid has vanished.

The other thing about supporting "-+" prefix is a different thing however. Currently "+" for "nouser" is ignored, so when python3 is not accessible by user rabbitmq then it fails. This may change in the future.

PenelopeFudd commented 1 year ago

Ok, good to know.
I had been under the impression that it could see the return value of the exec() call if it exited immediately (daemonized, for instance), just not if the exec() call kept running.

PenelopeFudd commented 1 year ago

In this case, is it trying to exec() the program +/home/application/bin/python3 and failing? Wouldn't it be possible to say Path '%s' is not absolute, will not exec(), or if relative paths are allowed, then just Pathname '%s' not found, will not exec()?
That would be helpful whether or not + for nouser is implemented.