Site A: Ubuntu 22.04 Linux VM and FRR for BGP
Site B: Catalyst Router
FRR configured with a very minimalistic config - just to exchange routes with neighbors
bgp session established between A to B.
systemctl restart frr failed after bgp session was established.
Issue: FRR service restart failing with zebra error- zclient_send_message: buffer_write failed to zclient error
Add the neighbor router configuration and establish a bgp session
Ensure the bgp session is established between A and B
Restart the frr service
frr service restart fails.
root@10-1-1-1:~# cat a.sh
while true ; do sleep 3 ; systemctl restart frr ; systemctl status frr | grep running; if [ $? -eq 1 ]; then exit 1; fi; done
root@10-1-1-1:~#
root@10-1-1-1:~# cat /tmp/a.log
Active: active (running) since Thu 2024-11-21 08:54:31 UTC; 5ms ago
Active: active (running) since Thu 2024-11-21 08:54:39 UTC; 5ms ago
Active: active (running) since Thu 2024-11-21 08:54:48 UTC; 5ms ago
Job for frr.service failed.
Expected behavior
Steps -
Add the neighbor router configuration and establish a bgp session
Ensure the bgp session is established between A and B
Restart the frr service
frr restart should be successful
Actual behavior
Steps -
Add the neighbor router configuration and establish a bgp session
Ensure the bgp session is established between A and B
Restart the frr service
frr restart failed
Additional context
Workaround is to stop and start the frr service -
Nov 21 08:44:25 10-1-1-1 bgpd[42189]: [YAF85-253AP][EC 100663299] buffer_write: write error on fd 15: Broken pipe
Nov 21 08:44:25 10-1-1-1 bgpd[42189]: [X6B3Y-6W42R][EC 100663302] zclient_send_message: buffer_write failed to zclient fd 15, closing
Nov 21 08:44:25 10-1-1-1 zebra[42184]: [QS0NJ-H5QKJ] Zebra final shutdown
Nov 21 08:44:25 10-1-1-1 frrinit.sh[42335]: * Stopped staticd
Nov 21 08:44:25 10-1-1-1 frrinit.sh[42336]: * Stopped bgpd
Nov 21 08:44:25 10-1-1-1 frrinit.sh[42337]: * Stopped zebra
Nov 21 08:44:25 10-1-1-1 systemd[1]: frr.service: Deactivated successfully.
Nov 21 08:44:25 10-1-1-1 systemd[1]: Stopped FRRouting.
Nov 21 08:44:25 10-1-1-1 systemd[1]: frr.service: Start request repeated too quickly.
Nov 21 08:44:25 10-1-1-1 systemd[1]: frr.service: Failed with result 'start-limit-hit'.
Nov 21 08:44:25 10-1-1-1 systemd[1]: Failed to start FRRouting.
Nov 21 08:44:25 10-1-1-1 systemd[1]: frr.service: Triggering OnFailure= dependencies.
Nov 21 08:44:25 10-1-1-1 systemd[1]: frr.service: Failed to enqueue OnFailure= job, ignoring: Unit heartbeat-failed@frr.service not f>
Nov 21 08:44:52 10-1-1-1 systemd[1]: frr.service: Start request repeated too quickly.
Nov 21 08:44:52 10-1-1-1 systemd[1]: frr.service: Failed with result 'start-limit-hit'.
Nov 21 08:44:52 10-1-1-1 systemd[1]: Failed to start FRRouting.
root@10-1-1-1 :~#
root@10-1-1-1 :~#
root@10-1-1-1 :~# systemctl stop frr
root@10-1-1-1 :~# systemctl start frr
root@10-1-1-1 :~# systemctl status frr
● frr.service - FRRouting
Loaded: loaded (/lib/systemd/system/frr.service; enabled; vendor preset: enabled)
Active: active (running) since Thu 2024-11-21 08:50:16 UTC; 2s ago
Docs: https://frrouting.readthedocs.io/en/latest/setup.html
Process: 47341 ExecStart=/usr/lib/frr/frrinit.sh start (code=exited, status=0/SUCCESS)
Main PID: 47350 (watchfrr)
Status: "FRR Operational"
Tasks: 13 (limit: 23695)
Memory: 17.2M
CPU: 435ms
CGroup: /system.slice/frr.service
├─47350 /usr/lib/frr/watchfrr -d -F traditional zebra bgpd staticd
├─47366 /usr/lib/frr/zebra -d -F traditional -A 127.0.0.1 -s 90000000
├─47372 /usr/lib/frr/bgpd -d -F traditional --daemon -A 127.0.0.1 -l 10.1.1.1
└─47379 /usr/lib/frr/staticd -d -F traditional -A 127.0.0.1
Nov 21 08:50:12 10-1-1-1 zebra[47366]: [VTVCM-Y2NW3] Configuration Read in Took: 00:00:00
Nov 21 08:50:12 10-1-1-1 bgpd[47372]: [VTVCM-Y2NW3] Configuration Read in Took: 00:00:00
Nov 21 08:50:12 10-1-1-1 staticd[47379]: [VTVCM-Y2NW3] Configuration Read in Took: 00:00:00
Nov 21 08:50:12 10-1-1-1 watchfrr[47350]: [ZJW5C-1EHNT] restart all process 47351 exited with non-zero status 13
Nov 21 08:50:16 10-1-1-1 watchfrr[47350]: [QDG3Y-BY5TN] bgpd state -> up : connect succeeded
Nov 21 08:50:16 10-1-1-1 watchfrr[47350]: [QDG3Y-BY5TN] zebra state -> up : connect succeeded
Nov 21 08:50:16 10-1-1-1 watchfrr[47350]: [QDG3Y-BY5TN] staticd state -> up : connect succeeded
Nov 21 08:50:16 10-1-1-1 watchfrr[47350]: [KWE5Q-QNGFC] all daemons up, doing startup-complete notify
Nov 21 08:50:16 10-1-1-1 frrinit.sh[47341]: * Started watchfrr
Nov 21 08:50:16 10-1-1-1 systemd[1]: Started FRRouting.
root@10-1-1-1:~#
Checklist
[X] I have searched the open issues for this bug.
[X] I have not included sensitive information in this report.
Description
frr.log
Our setup:
Site A: Ubuntu 22.04 Linux VM and FRR for BGP Site B: Catalyst Router FRR configured with a very minimalistic config - just to exchange routes with neighbors
bgp session established between A to B. systemctl restart frr failed after bgp session was established.
Issue: FRR service restart failing with zebra error- zclient_send_message: buffer_write failed to zclient error
bgp configuration -
Version
How to reproduce
Steps to reproduce -
Expected behavior
Steps -
Actual behavior
Steps -
Additional context
Workaround is to stop and start the frr service -
Checklist