FRRouting / frr

The FRRouting Protocol Suite
https://frrouting.org/
Other
3.12k stars 1.2k forks source link

after installing RPM package for CENTOS, multiple OSPF instances doesn't work #3756

Closed from88 closed 5 years ago

from88 commented 5 years ago

after installing RPM package for CENTOS, multiple OSPF instances doesn't work:

yum install -y https://github.com/FRRouting/frr/releases/download/frr-6.0.2/frr-6.0.2-01.el7.centos.x86_64.rpm

vi /etc/frr/daemons
zebra=yes
bgpd=yes
ospfd=yes
ospfd_instances="1,2"
staticd=yes

sudo systemctl start frr
vtysh
"ip ospf 2 area 0.0.0.0
**Warning: connecting to ospfd...failed!"** 
ps -aux | grep ospf
frr      26468  0.0  0.0  62432  2828 ?        S<s  10:07   0:00 /usr/lib/frr/ospfd -d -A 127.0.0.1 -n 1
frr      26479  0.0  0.0  62428  2828 ?        S<s  10:07   0:00 /usr/lib/frr/ospfd -d -A 127.0.0.1 -n 2
root     26500  0.0  0.0  58388  1428 ?        S<s  10:07   0:00 /usr/lib/frr/watchfrr -d -r /usr/lib/frr/frr restart %s -s /usr/lib/frr/frr start %s -k /usr/lib/frr/frr stop %s zebra bgpd ospfd-1 ospfd-2 staticd
root     26540  0.0  0.0 112708   976 pts/0    S+   10:10   0:00 grep --color=auto ospf

With the help of equinox (On slack) we tried to workaround this:

We tried to make the service run with: ExecStart=/usr/lib/frr/frrinit.sh start also add options: watchfrr_options="-r '/usr/lib/frr/watchfrr.sh restart %s' -s '/usr/lib/frr/watchfrr.sh start %s' -k '/usr/lib/frr/watchfrr.sh stop %s'"

but after that i got some other errors:

● frr.service - FRRouting (FRR)
   Loaded: loaded (/usr/lib/systemd/system/frr.service; enabled; vendor preset: disabled)
   Active: active (running) since Thu 2019-02-07 14:05:36 CET; 28s ago
  Process: 37788 ExecStop=/usr/lib/frr/frr stop (code=exited, status=0/SUCCESS)
  Process: 37922 ExecStart=/usr/lib/frr/frrinit.sh start (code=exited, status=0/SUCCESS)
 Main PID: 35129 (code=exited, status=0/SUCCESS)
   CGroup: /system.slice/frr.service
           ├─37925 /usr/lib/frr/watchfrr -d -r /usr/lib/frr/watchfrr.sh restart %s -s /usr/lib/frr/watchfrr.sh start %s -k /usr/lib/frr/watchfrr.sh stop %s zebra bgpd ospfd ospfd-1,2 static...
           ├─37944 /usr/lib/frr/zebra -d -A 127.0.0.1
           ├─37946 /usr/lib/frr/bgpd -d -A 127.0.0.1
           ├─37951 /usr/lib/frr/ospfd -d -A 127.0.0.1
           ├─37954 /usr/lib/frr/ospfd -d -n 1,2 -A 127.0.0.1
           └─37957 /usr/lib/frr/staticd -d -A 127.0.0.1

Feb 07 14:04:41 netvpn002prpjay.lin.pr.adform.zone watchfrr[37925]: Forked background command [pid 37926]: /usr/lib/frr/watchfrr.sh restart all
Feb 07 14:04:42 netvpn002prpjay.lin.pr.adform.zone watchfrr[37925]: zebra state -> up : connect succeeded
Feb 07 14:04:42 netvpn002prpjay.lin.pr.adform.zone watchfrr[37925]: bgpd state -> up : connect succeeded
Feb 07 14:04:42 netvpn002prpjay.lin.pr.adform.zone watchfrr[37925]: ospfd state -> up : connect succeeded
Feb 07 14:04:42 netvpn002prpjay.lin.pr.adform.zone watchfrr[37925]: staticd state -> up : connect succeeded
Feb 07 14:04:46 netvpn002prpjay.lin.pr.adform.zone watchfrr[37925]: Forked background command [pid 37959]: /usr/lib/frr/watchfrr.sh restart ospfd-1,2
Feb 07 14:05:36 netvpn002prpjay.lin.pr.adform.zone watchfrr[37925]: startup did not complete within timeout (4/5 daemons running)
Feb 07 14:05:36 netvpn002prpjay.lin.pr.adform.zone frrinit.sh[37922]: Started watchfrr[  OK  ]
Feb 07 14:05:36 netvpn002prpjay.lin.pr.adform.zone systemd[1]: Started FRRouting (FRR).
Feb 07 14:05:47 netvpn002prpjay.lin.pr.adform.zone watchfrr[37925]: Forked background command [pid 38005]: /usr/lib/frr/watchfrr.sh restart ospfd-1,2

i'm using CentOS Linux release 7.6.1810 (Core).

Thanks

ton31337 commented 5 years ago

assign me, I'll debug more what's happening here.

ton31337 commented 5 years ago
[root@exit1-centos-76 frr]# ps aufx | grep frr
root      6328  0.0  0.0 112708   980 pts/0    S+   19:40   0:00                          \_ grep --color=auto frr
frr       6174  0.0  0.3 355564  3368 ?        S<sl 19:34   0:00 /usr/lib/frr/zebra -d -A 127.0.0.1
frr       6183  0.0  0.4 210580  4756 ?        S<sl 19:34   0:00 /usr/lib/frr/bgpd -d -A 127.0.0.1
frr       6200  0.0  0.3  60456  3200 ?        S<s  19:34   0:00 /usr/lib/frr/ospfd -d -A 127.0.0.1 -n 1
frr       6209  0.0  0.2  60324  2696 ?        S<s  19:34   0:00 /usr/lib/frr/ospfd -d -A 127.0.0.1 -n 2
frr       6219  0.0  0.1  58820  1796 ?        S<s  19:34   0:00 /usr/lib/frr/staticd -d -A 127.0.0.1
root      6233  0.0  0.1  58392  1432 ?        S<s  19:34   0:00 /usr/lib/frr/watchfrr -d -r /usr/lib/frr/frr restart %s -s /usr/lib/frr/frr start %s -k /usr/lib/frr/frr stop %s zebra bgpd ospfd-1 ospfd-2 staticd

vtysh is trying to connect to ospfd.vty.

[root@exit1-centos-76 frr]# strace -econnect -ff -s2048 vtysh -c 'conf t' -c 'router ospf 2' 2>&1 | grep ospfd
connect(4, {sa_family=AF_LOCAL, sun_path="/var/run/frr/ospfd.vty"}, 24) = -1 ECONNREFUSED (Connection refused)
connect(4, {sa_family=AF_LOCAL, sun_path="/var/run/frr/ospfd-1.vty"}, 26) = 0
connect(5, {sa_family=AF_LOCAL, sun_path="/var/run/frr/ospfd-2.vty"}, 26) = 0
Warning: connecting to ospfd...connect(9, {sa_family=AF_LOCAL, sun_path="/var/run/frr/ospfd.vty"}, 24) = -1 ECONNREFUSED (Connection refused)

Basically starting up the main ospfd process without instance specified additionally, would solve the problem (trivial changes to frr.in). But there will be a way to configure router ospf and router ospf 1 both, which is confusing.

ton31337 commented 5 years ago
[root@leaf1-centos-76 frr]# git bisect bad
67736451c55347849909cb3a05c706d5f9e2c29a is the first bad commit
commit 67736451c55347849909cb3a05c706d5f9e2c29a
Author: Mladen Sablic <mladen.sablic@gmail.com>
Date:   Mon May 21 20:00:51 2018 +0200

    vtysh: reconnect to daemons when connection lost

    Functionality to let vtysh attempt to reconnect to daemons when
    connection is lost (e.g. crash or restart).

    Signed-off-by: Mladen Sablic <mladen.sablic@gmail.com>

:040000 040000 cc3deb582d09a45b7924611aecd914738f96b8f6 cd70810673bc305406e30f655c907e470078fbec M  vtysh
donaldsharp commented 5 years ago

@ton31337 do you have b2443937b06d79f1caa46cdb7625e3d6831b2166 in the non-working case?

ton31337 commented 5 years ago

yeah, this fixes the problem entirely. we should drop https://github.com/FRRouting/frr/commit/b2443937b06d79f1caa46cdb7625e3d6831b2166 to 6 branch :)

eqvinox commented 5 years ago

Also, here's a branch converting RPMs over to the new init scripts: https://github.com/opensourcerouting/frr/tree/6.0/redhat-new-init

NB: UNTESTED.

ton31337 commented 5 years ago

tested this - works with the applied change in stable/6.0 (https://github.com/FRRouting/frr/commit/b2443937b06d79f1caa46cdb7625e3d6831b2166)