Closed cfiehe closed 9 months ago
keepalived works hard to ensure that it does not leave behind any process it created, but also allows time for any processes it created, such as the notify fifo script, to terminate.
The detail of what keepalived does is
So you have 1 second in which to close down, provided that you catch SIGTERM. If your script does not catch SIGTERM then it should die immediately it receives the SIGTERM.
So I think your best solution is to catch SIGTERM, and in the signal handler execute another script. So you could add something like the following:
trap shutdown SIGTERM
shutdown()
{
/etc/keepalived/scripts/shutdown.sh &
exit 0
}
and in /etc/keepalived/scripts/shutdown.sh do whatever you want to do that takes more than 1 second.
I haven't tested this, but I think it shold work.
Any feedback would be appreciated.
Hi @pqarmitage, thanks a lot. I have added the code snippet in order to catch the SIGTERM, but the result is the same. The script is only able to execute some steps of the stopping procedure and gets killed prematurely before regular termination. I have to stop a container and unmount some devices. This takes approximately 2 or 3 seconds.
The key thing is to run another script in background from the SIGTERM signal handler. This script being run in background will not be terminated by keepalived, and can take whatever actions you need for however long it takes.
As an example I have modified sample_notify_fifo.sh with the following patch:
diff --git a/doc/samples/sample_notify_fifo.sh b/doc/samples/sample_notify_fifo.sh
index ecda8f3..c061eef 100755
--- a/doc/samples/sample_notify_fifo.sh
+++ b/tmp/sample_notify_fifo.sh
@@ -66,6 +66,8 @@ start_shutdown()
( sleep 0.5
kill -ALRM $$ 2>/dev/null
) &
+
+ /tmp/handle_shutdown.sh &
}
trap stopping HUP INT QUIT USR1 USR2 PIPE ALRM
/tmp/handle_shutdown.sh:
#!/bin/bash
for i in $(seq 1 10); do
echo $(date) - run $i >>/tmp/shutdown.log
sleep 1
done
exit 0
When keepalived shuts down, I have the following keepalived log entries:
Sat Jan 27 16:02:41.350581032 2024: (docker) sent 0 priority
Sat Jan 27 16:02:41.350608480 2024: (docker) removing VIPs.
Sat Jan 27 16:02:42.361933472 2024: Stopped - used (self/children) 0.004787/0.004658 user time, 0.015321/0.022698 system time
The contents of /tmp/shutdown.log are:
Sat 27 Jan 16:02:41 GMT 2024 - run 1
Sat 27 Jan 16:02:42 GMT 2024 - run 2
Sat 27 Jan 16:02:43 GMT 2024 - run 3
Sat 27 Jan 16:02:44 GMT 2024 - run 4
Sat 27 Jan 16:02:45 GMT 2024 - run 5
Sat 27 Jan 16:02:46 GMT 2024 - run 6
Sat 27 Jan 16:02:47 GMT 2024 - run 7
Sat 27 Jan 16:02:48 GMT 2024 - run 8
Sat 27 Jan 16:02:49 GMT 2024 - run 9
Sat 27 Jan 16:02:50 GMT 2024 - run 10
This shows that the handle_shutdown.sh script continues to run (in this case for 8 seconds) after keepalived has terminated.
The main problem with a background process is that there is the theoretical risk of a race condition when (re-)start and shutdown overlap because no sequential order is guaranteed out of the box.
I am experimenting with a solution where the notify handler is controlled by a separate systemd unit and runs in it own scope. Coupling between Keepalived and the notify handler is done via the fifo queue. In that case Keepalived does not manage the process lifecycle of the notify handler.
What do you think: Does this solution have any drawbacks?
I can't think of any drawbacks to your approach, and when I designed the FIFO interface I very much had in mind the sort of scenario you describe, as well as the one where keepalived runs the fifo reading script as you were doing.
Describe the bug I am not sure, if it is a bug or if we do something wrong. We are using the FIFO sample from here: https://github.com/acassen/keepalived/blob/master/doc/samples/sample_notify_fifo.sh.
We hoped that this approach could be used to ensure that our stopping script gets executed completely and does not get killed prematurely when Keepalived stops. Unfortunately, it does not seem to work or we are missing something vital.
To Reproduce We use:
And added the following to the FIFO loop in order to simulate a longer running command when Keepalived stops:
Expected behavior We expect that the process is not killed before the command has finished. The FIFO log gives us the following after starting and stopping Keepalived:
The log entry
Command is finished
is missing in the log because the FIFO process seems to be terminated prematurely. Also the subsequent stop command in the sample is never executed.Keepalived version
Distro (please complete the following information):
Details of any containerisation or hosted service (e.g. AWS) If keepalived is being run in a container or on a hosted service, provide full details
Configuration file:
What can we do to ensure the the FIFO process is not being killed until the command terminates?