Closed jimklimov closed 7 years ago
As discovered by validation, the original solution in #25 and #26 was flawed :
root@validation-rc4:~# systemctl status fty-sensor-gpio
* fty-sensor-gpio.service - fty-sensor-gpio service
Loaded: loaded (/lib/systemd/system/fty-sensor-gpio.service; enabled)
Active: active (running) since Wed 2017-08-02 15:49:48 UTC; 1h 7min ago
Process: 13496 ExecStop=/bin/kill -KILL $MAINPID (code=exited, status=0/SUCCESS)
Main PID: 14038 (fty-sensor-gpio)
CGroup: /system.slice/fty-sensor-gpio.service
`-14038 /usr/bin/fty-sensor-gpio -c /etc/fty-sensor-gpio/fty-sensor-gpio.cfg
...
root@validation-rc4:~# systemctl stop fty-sensor-gpio
root@validation-rc4:~# systemctl status fty-sensor-gpio
* fty-sensor-gpio.service - fty-sensor-gpio service
Loaded: loaded (/lib/systemd/system/fty-sensor-gpio.service; enabled)
Active: failed (Result: signal) since Wed 2017-08-02 16:57:17 UTC; 2s ago
Process: 21647 ExecStop=/bin/kill -KILL $MAINPID (code=exited, status=0/SUCCESS)
Process: 14038 ExecStart=/usr/bin/fty-sensor-gpio -c /etc/fty-sensor-gpio/fty-sensor-gpio.cfg (code=killed, signal=KILL)
Main PID: 14038 (code=killed, signal=KILL)
...
Aug 02 16:57:17 validation-rc4 systemd[1]: Stopping fty-sensor-gpio service...
Aug 02 16:57:17 validation-rc4 snoopy[21647]: [uid:1000 sid:21647 tty:(none) cwd:/ filename:/bin/kill]: /bin/kill -KILL 14038
Aug 02 16:57:17 validation-rc4 systemd[1]: fty-sensor-gpio.service: main process exited, code=killed, status=9/KILL
Aug 02 16:57:17 validation-rc4 systemd[1]: Stopped fty-sensor-gpio service.
Aug 02 16:57:17 validation-rc4 systemd[1]: Unit fty-sensor-gpio.service entered failed state.
After changing it to filter the SIGKILL (part of commit above), this became a normal non-failed stopped service:
* fty-sensor-gpio.service - fty-sensor-gpio service
Loaded: loaded (/lib/systemd/system/fty-sensor-gpio.service; enabled)
Active: inactive (dead) since Wed 2017-08-02 17:02:52 UTC; 1s ago
Process: 22034 ExecStop=/bin/kill -KILL $MAINPID (code=exited, status=0/SUCCESS)
Process: 21988 ExecStart=/usr/bin/fty-sensor-gpio -c /etc/fty-sensor-gpio/fty-sensor-gpio.cfg (code=killed, signal=KILL)
Main PID: 21988 (code=killed, signal=KILL)
...
Aug 02 17:02:52 validation-rc4 systemd[1]: Stopping fty-sensor-gpio service...
Aug 02 17:02:52 validation-rc4 snoopy[22034]: [uid:1000 sid:22034 tty:(none) cwd:/ filename:/bin/kill]: /bin/kill -KILL 21988
Aug 02 17:02:52 validation-rc4 systemd[1]: Stopped fty-sensor-gpio service.
The whole commit solution also tries a clean SIGTERM first, to be a good citizen:
* fty-sensor-gpio.service - fty-sensor-gpio service
Loaded: loaded (/lib/systemd/system/fty-sensor-gpio.service; enabled)
Active: inactive (dead) since Wed 2017-08-02 16:59:37 UTC; 1s ago
Process: 21785 ExecStart=/usr/bin/fty-sensor-gpio -c /etc/fty-sensor-gpio/fty-sensor-gpio.cfg (code=exited, status=0/SUCCESS)
Main PID: 21785 (code=exited, status=0/SUCCESS)
...
Aug 02 16:59:37 validation-rc4 systemd[1]: Stopping fty-sensor-gpio service...
Aug 02 16:59:37 validation-rc4 fty-sensor-gpio[21785]: D: 17-08-02 16:59:37 fty_sensor_gpio: received command $TERM
Aug 02 16:59:37 validation-rc4 fty-sensor-gpio[21785]: D: 17-08-02 16:59:37 fty-gpio-sensor-assets: received command $TERM
Aug 02 16:59:37 validation-rc4 fty-sensor-gpio[21785]: D: 17-08-02 16:59:37 fty_sensor_gpio: received command $TERM
Aug 02 16:59:37 validation-rc4 systemd[1]: Stopped fty-sensor-gpio service.
I believe this solution would make the extraordinary dependency ordering with fty-asset redundant :)
Experimented a bit more in this area. The other solution I proposed in #25 (using KillSignal=SIGKILL
) has the same behavior as the initial explicit ExecStop=/bin/kill -KILL $MAINPID
- it fails the stopped service :
* fty-sensor-gpio.service - fty-sensor-gpio service
Loaded: loaded (/lib/systemd/system/fty-sensor-gpio.service; enabled)
Active: failed (Result: signal) since Wed 2017-08-02 17:34:51 UTC; 4s ago
Main PID: 23467 (code=killed, signal=KILL)
...
Aug 02 17:34:51 validation-rc4 systemd[1]: Stopping fty-sensor-gpio service...
Aug 02 17:34:51 validation-rc4 systemd[1]: fty-sensor-gpio.service: main process exited, code=killed, status=9/KILL
Aug 02 17:34:51 validation-rc4 systemd[1]: Stopped fty-sensor-gpio service.
Aug 02 17:34:51 validation-rc4 systemd[1]: Unit fty-sensor-gpio.service entered failed state.
In all cases (/bin/kill
or KillSignal
without SuccessExitStatus
filtering so with failed
state, and Timeout...
with SuccessExitStatus
filter and inactive (dead)
state), when I removed fty-asset
from the WantedBy
list to match our other services who are all WantedBy=bios.target
, the GPIO service stops due to systemctl stop fty-asset
and restarts automatically after systemctl stop fty-asset
. Both services stopped via fty-asset
also start after systemctl start fty-sensor-gpio.service
. Both also restart upon systemctl restart fty-asset
, although only fty-sensor-gpio
is restarted upon systemctl restart fty-sensor-gpio
.
With the WantedBy
hack in place, behavior for both systemctl restart
activities is same as described above, and systemctl stop fty-sensor-gpio.service
does not stop a running fty-asset
(Want
dependency type is too weak for that, and this change in weights avoids a dependency loop however).
When I explicitly stop each of the services and then start fty-asset, gpio comes up because it is wanted by asset and has no problem that would block it from starting. In fact, it happens the same (both come up) even when the WantedBy=bios.target
only. I believe this is because that target is activated and still wants all our services, so when blockers are removed (e.g. asset Required by gpio), the services wanted by the target are started.
So this WantedBy
seems unwarranted in the first place, unless fty-sensor-gpio
does feed some data into assets and that must be done shortly after startup of both. Even so, such requirement (if it exists) would seem like a protocol flaw, depending on early broadcasts rather than explicit requests and replies for data (and retries) when needed.
Solution: attempt SIGTERM first shortly, and filter SIGKILL from failed-state causes for systemd unit
Signed-off-by: Jim Klimov EvgenyKlimov@eaton.com