Closed ardenisov closed 4 days ago
How are you starting Zebra? Can you give us the container script that starts FR/R?
How are you starting Zebra? Can you give us the container script that starts FR/R?
#!/bin/bash
if [ -r "/lib/lsb/init-functions" ]; then
. /lib/lsb/init-functions
else
log_success_msg() {
echo "$@"
}
log_warning_msg() {
echo "$@" >&2
}
log_failure_msg() {
echo "$@" >&2
}
fi
source /usr/lib/frr/frrcommon.sh
/usr/lib/frr/watchfrr $(daemon_list)
ps aux | grep frr
1 root 0:01 /sbin/tini -- /usr/lib/frr/docker-start
7 root 0:00 {docker-start} /bin/bash /usr/lib/frr/docker-start
11 root 0:20 /usr/lib/frr/watchfrr zebra mgmtd bgpd staticd bfdd
159 frr 0:05 /usr/lib/frr/zebra -d -F traditional -A 127.0.0.1 -s 90000000 -M dplane_fpm_nl
165 frr 0:01 /usr/lib/frr/mgmtd -d -F traditional
167 frr 0:22 /usr/lib/frr/bgpd -d -F traditional -A 127.0.0.1
174 frr 0:01 /usr/lib/frr/staticd -d -F traditional -A 127.0.0.1
177 frr 15:29 /usr/lib/frr/bfdd -d -F traditional -A 127.0.0.1
1.configure static route vtysh conf t ip route 100.70.1.254/32 Null0 2.check route in kernel ip r | grep 100.70.1.254 blackhole 100.70.1.254 proto 196 metric 20 3.stop frr sudo docker stop frr 4.check route in kernel ip r | grep 100.70.1.254 .blackhole 100.70.1.254 proto 196 metric 20 5.start frr sudo docker start frr 6.check route in frr vtysh 7f4ad6eb72fb# show ip route 100.70.1.254/32 Routing entry for 100.70.1.254/32 Known via "static", distance 1, metric 0, best Last update 00:00:33 ago
@riw777 @Darwin4053 Do you know guys, how to debug route updates in kernel when frr stopped?
1.configure static route vtysh conf t ip route 100.70.1.254/32 Null0 2.check route in kernel ip r | grep 100.70.1.254 blackhole 100.70.1.254 proto 196 metric 20 3.stop frr sudo systemctl stop frr 4.check route in kernl ip r | grep 100.70.1.254 i didn't see any route here. 5.start frr sudo systemctl start frr 6.check route in frr vtysh 7f4ad6eb72fb# show ip route 100.70.1.254/32 Routing entry for 100.70.1.254/32 Known via "static", distance 1, metric 0, best Last update 00:00:33 ago
- unreachable (blackhole), weight 1 7.try to delete static route from frr 7f4ad6eb72fb(config)# no ip route 100.70.1.254/32 Null0 7f4ad6eb72fb(config)# 7f4ad6eb72fb(config)# exit 7f4ad6eb72fb# 7f4ad6eb72fb# show ip route 100.70.1.254/32 % Network not in table 7f4ad6eb72fb# exit frr@7f4ad6eb72fb:/$ ip r | grep 100.70.1.254 frr@7f4ad6eb72fb: I followed above steps for reproduce .static route is succesfully deleted from kernel .
What version of frr did you test? I have this problem with 9.1.1, but not with 8.5.
@askorichenko hello! Can you help me, Is below fix applicable for routes in default vrf table?https://github.com/FRRouting/frr/pull/15570/commits/69f07fab28b32846a95571eb7404ef870cc3784c I see in pull request https://github.com/FRRouting/frr/pull/15424 that you reproduced bug in default vrf table, but in commit above I see some VRF related code. Also Is it could happen that your fix is not aware of static routes with Null0 (blackhole) nh configured through vtysh?
There is inconsistency, with docker when the processes receive signals. while passing SIGINT/SIGTERM to staticd sometimes route is getting cleared sometimes not.
@Darwin4053 staticd receives somehow SIGKILL instead SIGINT/SIGTERM even /sbin/tini used as ENTRYPOINT in docker image
ppoll([{fd=11, events=POLLIN}, {fd=12, events=POLLIN}, {fd=10, events=POLLIN}, {fd=13, events=POLLIN}, {fd=14, events=POLLIN}, {fd=6, events=POLLIN}], 6, NULL, [], 8 <unfinished ...>) = ?
+++ killed by SIGKILL +++
As I can see in tini logs, it only reaps watchfrr process correctly with SIGTERM, but all other processes in container end up with SIGKILL.
[DEBUG tini (1)] Passing signal: 'Terminated'
[TRACE tini (1)] No child to reap
[DEBUG tini (1)] Received SIGCHLD
[DEBUG tini (1)] Reaped child with pid: '7'
[INFO tini (1)] Main child exited with signal (with signal 'Terminated')
[TRACE tini (1)] No child to reap
[TRACE tini (1)] Exiting: child has exited
frr processes in docker for example
ps a
PID USER TIME COMMAND
1 root 0:00 /sbin/tini -vvv -- /usr/lib/frr/docker-start
7 root 0:00 {docker-start} /bin/bash /usr/lib/frr/docker-start
11 root 0:00 /usr/lib/frr/watchfrr zebra mgmtd bgpd staticd bfdd
27 frr 0:01 /usr/lib/frr/zebra -d -F traditional -A 127.0.0.1 -s 90000000 -M dplane_fpm_nl
33 frr 0:00 /usr/lib/frr/mgmtd -d -F traditional
35 frr 0:00 /usr/lib/frr/bgpd -d -F traditional -A 127.0.0.1
42 frr 0:00 /usr/lib/frr/staticd -d -F traditional -A 127.0.0.1
45 frr 0:02 /usr/lib/frr/bfdd -d -F traditional -A 127.0.0.1
even all frr daemons have parent pid of tini (1)
cat /proc/27/status | grep PPid
PPid: 1
cat /proc/33/status | grep PPid
PPid: 1
cat /proc/35/status | grep PPid
PPid: 1
cat /proc/42/status | grep PPid
PPid: 1
cat /proc/45/status | grep PPid
PPid: 1
another look to tini childs
pgrep -lP 1
7 /bin/bash
27 /usr/lib/frr/zebra
33 /usr/lib/frr/mgmtd
35 /usr/lib/frr/bgpd
42 /usr/lib/frr/staticd
45 /usr/lib/frr/bfdd
@Darwin4053 @riw777 Hello! I confirmed with tini contributors, that it should work with -g option, to send signal to all childs in its process group. But as I see in my container, all daemons has their own pgid.
ps -o pid,ppid,pgid,comm
PID PPID PGID COMMAND
1 0 1 tini
7 1 7 docker-start
11 7 7 watchfrr
27 1 27 zebra
33 1 33 mgmtd
35 1 35 bgpd
42 1 42 staticd
45 1 45 bfdd
117 0 117 bash
135 117 135 ps
Also I find in watchfrr code, that it to set different pgid for every daemon. https://github.com/FRRouting/frr/blob/master/watchfrr/watchfrr.c#L321 How can I overcome this watchfrr behaviour?
Hello! I have some updates. I eliminated tini as entrypoint, cause it doesn't help to stop frr daemons clearly. Also I added some code to docker-start file, so it can trap TERM signal, forward it to watchfrr and flush static routes from kernel.
1.configure static route vtysh conf t ip route 100.70.1.254/32 Null0 2.check route in kernel ip r | grep 100.70.1.254 blackhole 100.70.1.254 proto 196 metric 20 3.stop frr sudo systemctl stop frr 4.check route in kernl ip r | grep 100.70.1.254 i didn't see any route here. 5.start frr sudo systemctl start frr 6.check route in frr vtysh 7f4ad6eb72fb# show ip route 100.70.1.254/32 Routing entry for 100.70.1.254/32 Known via "static", distance 1, metric 0, best Last update 00:00:33 ago
- unreachable (blackhole), weight 1 7.try to delete static route from frr 7f4ad6eb72fb(config)# no ip route 100.70.1.254/32 Null0 7f4ad6eb72fb(config)# 7f4ad6eb72fb(config)# exit 7f4ad6eb72fb# 7f4ad6eb72fb# show ip route 100.70.1.254/32 % Network not in table 7f4ad6eb72fb# exit frr@7f4ad6eb72fb:/$ ip r | grep 100.70.1.254 frr@7f4ad6eb72fb: I followed above steps for reproduce .static route is succesfully deleted from kernel .
What version of frr did you test? I have this problem with 9.1.1, but not with 8.5.
IN 9.1.1 only I have tested.
Description
Static routes configured through vty shell not removed from kernel after frr restart
Version
How to reproduce
configure static route
check route in kernel
stop frr
check route in kernel
start frr
check route in frr
try to delete static route from frr
Expected behavior
static routes should be deleted from kernel
Actual behavior
static routes still in kernel even frr is stopped
Additional context
error in logs
kernel
docker
frr_runnig.txt frr_startup.txt
Checklist