42ity / fty-sensor-gpio

Agent to manage GPIO sensors and devices
Other
0 stars 9 forks source link

Problem: systemctl stop fty-sensor-gpio makes service failed #31

Closed jimklimov closed 7 years ago

jimklimov commented 7 years ago

Solution: attempt SIGTERM first shortly, and filter SIGKILL from failed-state causes for systemd unit

Signed-off-by: Jim Klimov EvgenyKlimov@eaton.com

jimklimov commented 7 years ago

As discovered by validation, the original solution in #25 and #26 was flawed :

root@validation-rc4:~# systemctl status fty-sensor-gpio
* fty-sensor-gpio.service - fty-sensor-gpio service
   Loaded: loaded (/lib/systemd/system/fty-sensor-gpio.service; enabled)
   Active: active (running) since Wed 2017-08-02 15:49:48 UTC; 1h 7min ago
  Process: 13496 ExecStop=/bin/kill -KILL $MAINPID (code=exited, status=0/SUCCESS)
 Main PID: 14038 (fty-sensor-gpio)
   CGroup: /system.slice/fty-sensor-gpio.service
           `-14038 /usr/bin/fty-sensor-gpio -c /etc/fty-sensor-gpio/fty-sensor-gpio.cfg
...

root@validation-rc4:~# systemctl stop fty-sensor-gpio
root@validation-rc4:~# systemctl status fty-sensor-gpio
* fty-sensor-gpio.service - fty-sensor-gpio service
   Loaded: loaded (/lib/systemd/system/fty-sensor-gpio.service; enabled)
   Active: failed (Result: signal) since Wed 2017-08-02 16:57:17 UTC; 2s ago
  Process: 21647 ExecStop=/bin/kill -KILL $MAINPID (code=exited, status=0/SUCCESS)
  Process: 14038 ExecStart=/usr/bin/fty-sensor-gpio -c /etc/fty-sensor-gpio/fty-sensor-gpio.cfg (code=killed, signal=KILL)
 Main PID: 14038 (code=killed, signal=KILL)
...
Aug 02 16:57:17 validation-rc4 systemd[1]: Stopping fty-sensor-gpio service...
Aug 02 16:57:17 validation-rc4 snoopy[21647]: [uid:1000 sid:21647 tty:(none) cwd:/ filename:/bin/kill]: /bin/kill -KILL 14038
Aug 02 16:57:17 validation-rc4 systemd[1]: fty-sensor-gpio.service: main process exited, code=killed, status=9/KILL
Aug 02 16:57:17 validation-rc4 systemd[1]: Stopped fty-sensor-gpio service.
Aug 02 16:57:17 validation-rc4 systemd[1]: Unit fty-sensor-gpio.service entered failed state.

After changing it to filter the SIGKILL (part of commit above), this became a normal non-failed stopped service:

* fty-sensor-gpio.service - fty-sensor-gpio service
   Loaded: loaded (/lib/systemd/system/fty-sensor-gpio.service; enabled)
   Active: inactive (dead) since Wed 2017-08-02 17:02:52 UTC; 1s ago
  Process: 22034 ExecStop=/bin/kill -KILL $MAINPID (code=exited, status=0/SUCCESS)
  Process: 21988 ExecStart=/usr/bin/fty-sensor-gpio -c /etc/fty-sensor-gpio/fty-sensor-gpio.cfg (code=killed, signal=KILL)
 Main PID: 21988 (code=killed, signal=KILL)
...
Aug 02 17:02:52 validation-rc4 systemd[1]: Stopping fty-sensor-gpio service...
Aug 02 17:02:52 validation-rc4 snoopy[22034]: [uid:1000 sid:22034 tty:(none) cwd:/ filename:/bin/kill]: /bin/kill -KILL 21988
Aug 02 17:02:52 validation-rc4 systemd[1]: Stopped fty-sensor-gpio service.

The whole commit solution also tries a clean SIGTERM first, to be a good citizen:

* fty-sensor-gpio.service - fty-sensor-gpio service
   Loaded: loaded (/lib/systemd/system/fty-sensor-gpio.service; enabled)
   Active: inactive (dead) since Wed 2017-08-02 16:59:37 UTC; 1s ago
  Process: 21785 ExecStart=/usr/bin/fty-sensor-gpio -c /etc/fty-sensor-gpio/fty-sensor-gpio.cfg (code=exited, status=0/SUCCESS)
 Main PID: 21785 (code=exited, status=0/SUCCESS)
...
Aug 02 16:59:37 validation-rc4 systemd[1]: Stopping fty-sensor-gpio service...
Aug 02 16:59:37 validation-rc4 fty-sensor-gpio[21785]: D: 17-08-02 16:59:37 fty_sensor_gpio: received command $TERM
Aug 02 16:59:37 validation-rc4 fty-sensor-gpio[21785]: D: 17-08-02 16:59:37 fty-gpio-sensor-assets: received command $TERM
Aug 02 16:59:37 validation-rc4 fty-sensor-gpio[21785]: D: 17-08-02 16:59:37 fty_sensor_gpio: received command $TERM
Aug 02 16:59:37 validation-rc4 systemd[1]: Stopped fty-sensor-gpio service.

I believe this solution would make the extraordinary dependency ordering with fty-asset redundant :)

jimklimov commented 7 years ago

Experimented a bit more in this area. The other solution I proposed in #25 (using KillSignal=SIGKILL) has the same behavior as the initial explicit ExecStop=/bin/kill -KILL $MAINPID - it fails the stopped service :

* fty-sensor-gpio.service - fty-sensor-gpio service
   Loaded: loaded (/lib/systemd/system/fty-sensor-gpio.service; enabled)
   Active: failed (Result: signal) since Wed 2017-08-02 17:34:51 UTC; 4s ago
 Main PID: 23467 (code=killed, signal=KILL)
...
Aug 02 17:34:51 validation-rc4 systemd[1]: Stopping fty-sensor-gpio service...
Aug 02 17:34:51 validation-rc4 systemd[1]: fty-sensor-gpio.service: main process exited, code=killed, status=9/KILL
Aug 02 17:34:51 validation-rc4 systemd[1]: Stopped fty-sensor-gpio service.
Aug 02 17:34:51 validation-rc4 systemd[1]: Unit fty-sensor-gpio.service entered failed state.

In all cases (/bin/kill or KillSignal without SuccessExitStatus filtering so with failed state, and Timeout... with SuccessExitStatus filter and inactive (dead) state), when I removed fty-asset from the WantedBy list to match our other services who are all WantedBy=bios.target, the GPIO service stops due to systemctl stop fty-asset and restarts automatically after systemctl stop fty-asset. Both services stopped via fty-asset also start after systemctl start fty-sensor-gpio.service. Both also restart upon systemctl restart fty-asset, although only fty-sensor-gpio is restarted upon systemctl restart fty-sensor-gpio.

With the WantedBy hack in place, behavior for both systemctl restart activities is same as described above, and systemctl stop fty-sensor-gpio.service does not stop a running fty-asset (Want dependency type is too weak for that, and this change in weights avoids a dependency loop however).

When I explicitly stop each of the services and then start fty-asset, gpio comes up because it is wanted by asset and has no problem that would block it from starting. In fact, it happens the same (both come up) even when the WantedBy=bios.target only. I believe this is because that target is activated and still wants all our services, so when blockers are removed (e.g. asset Required by gpio), the services wanted by the target are started.

So this WantedBy seems unwarranted in the first place, unless fty-sensor-gpio does feed some data into assets and that must be done shortly after startup of both. Even so, such requirement (if it exists) would seem like a protocol flaw, depending on early broadcasts rather than explicit requests and replies for data (and retries) when needed.