Closed jdrouvroy closed 4 years ago
@jdrouvroy we're not quite understanding how this is an FRR bug - as FRR does not create GRE tunnels, just uses existing ones, this sounds like a misconfig in your tunnel setup and not something related to FRR per se.
FRR doesn't control interface MTU -- take a look that your interface and tunnel mtus are set properly at the os level. You find additional info useful at https://www.google.com/search?q=gre+over+ipsec+set+tunnel+mtu
@jdrouvroy Please verify if path MTU discovery is enabled on that tunnel interface
Currrent value: sysctl net.ipv4.ip_no_pmtu_disc Set it to 1: sysctl -w net.ipv4.ip_no_pmtu_disc=1 Check: sysctl net.ipv4.ip_no_pmtu_disc
Hi @bisdhdh,
Thank you for your reply. I disabled path MTU discovery on each routers, but same issue :(
What's the MTU on the GRE interface? Does sending up to that size work without fragmentation?
On October 25, 2019 8:00:45 AM jdrouvroy notifications@github.com wrote:
Hi @bisdhdh,
Thank you for your reply. I disabled path MTU discovery on each routers, but same issue :(
-- You are receiving this because you commented. Reply to this email directly or view it on GitHub: https://github.com/FRRouting/frr/issues/5201#issuecomment-546325791
Hi @louberger,
I'm sorry but i do not understand your question ^^
up @louberger ;)
@jdrouvroy Lou updated his question
Sorry @qlyoung, didn't see edited post
ip route get to 10.55.0.69 10.55.0.69 via 10.255.0.73 dev tunnel18 src 10.255.0.74 uid 0 cache
ip a | grep -A 3 tunnel18 7: tunnel18@NONE: <POINTOPOINT,NOARP,UP,LOWER_UP> mtu 1476 qdisc noqueue state UNKNOWN group default qlen 1000 link/gre 10.56.255.4 peer 10.58.95.4 inet 10.255.0.74/30 brd 10.255.0.75 scope global tunnel18 valid_lft forever preferred_lft forever inet6 fe80::5efe:a38:ff04/64 scope link valid_lft forever preferred_lft forever
ip route get to 10.55.0.69 10.55.0.69 via 10.255.0.97 dev tunnel24 src 10.255.0.98 uid 4001 cache
ip a | grep -A 3 tunnel24 9: tunnel24@NONE: <POINTOPOINT,NOARP,UP,LOWER_UP> mtu 1476 qdisc noqueue state UNKNOWN group default qlen 1000 link/gre 10.58.95.4 peer 10.55.0.4 inet 10.255.0.98/30 brd 10.255.0.99 scope global tunnel24 valid_lft forever preferred_lft forever inet6 fe80::5efe:a3a:5f04/64 scope link valid_lft forever preferred_lft forever
ping 10.55.0.69 PING 10.55.0.69 (10.55.0.69) 56(84) bytes of data. 64 bytes from 10.55.0.69: icmp_seq=1 ttl=60 time=13.7 ms 64 bytes from 10.55.0.69: icmp_seq=2 ttl=60 time=13.7 ms 64 bytes from 10.55.0.69: icmp_seq=3 ttl=60 time=13.7 ms 64 bytes from 10.55.0.69: icmp_seq=4 ttl=60 time=13.5 ms 64 bytes from 10.55.0.69: icmp_seq=5 ttl=60 time=13.6 ms 64 bytes from 10.55.0.69: icmp_seq=6 ttl=60 time=13.7 ms ^C --- 10.55.0.69 ping statistics --- 6 packets transmitted, 6 received, 0% packet loss, time 5007ms rtt min/avg/max/mdev = 13.584/13.707/13.762/0.162 ms
ping 10.55.0.69 -s 1476 PING 10.55.0.69 (10.55.0.69) 1476(1504) bytes of data. ^C --- 10.55.0.69 ping statistics --- 10 packets transmitted, 0 received, 100% packet loss, time 9063ms
ping 10.55.0.69 -s 1000 PING 10.55.0.69 (10.55.0.69) 1000(1028) bytes of data. ^C --- 10.55.0.69 ping statistics --- 5 packets transmitted, 0 received, 100% packet loss, time 4024ms
tcpdump -ni any host 10.55.0.69 tcpdump: verbose output suppressed, use -v or -vv for full protocol decode listening on any, link-type LINUX_SLL (Linux cooked), capture size 262144 bytes
19:50:51.399026 IP 172.16.97.2 > 10.55.0.69: ICMP echo request, id 7302, seq 1, length 1480
19:50:51.399037 IP 172.16.97.2 > 10.55.0.69: ip-proto-1 19:50:51.399058 IP 172.16.97.2 > 10.55.0.69: ICMP echo request, id 7302, seq 1, length 1456
19:50:51.399074 IP 172.16.97.2 > 10.55.0.69: ip-proto-1 19:50:52.398905 IP 172.16.97.2 > 10.55.0.69: ICMP echo request, id 7302, seq 2, length 1480
19:50:52.398917 IP 172.16.97.2 > 10.55.0.69: ip-proto-1 19:50:52.398938 IP 172.16.97.2 > 10.55.0.69: ICMP echo request, id 7302, seq 2, length 1456
19:50:52.398957 IP 172.16.97.2 > 10.55.0.69: ip-proto-1 19:50:53.398752 IP 172.16.97.2 > 10.55.0.69: ICMP echo request, id 7302, seq 3, length 1480 19:50:53.398763 IP 172.16.97.2 > 10.55.0.69: ip-proto-1 19:50:53.398782 IP 172.16.97.2 > 10.55.0.69: ICMP echo request, id 7302, seq 3, length 1456 19:50:53.398801 IP 172.16.97.2 > 10.55.0.69: ip-proto-1 19:50:54.398792 IP 172.16.97.2 > 10.55.0.69: ICMP echo request, id 7302, seq 4, length 1480 19:50:54.398802 IP 172.16.97.2 > 10.55.0.69: ip-proto-1 19:50:54.398818 IP 172.16.97.2 > 10.55.0.69: ICMP echo request, id 7302, seq 4, length 1456 19:50:54.398837 IP 172.16.97.2 > 10.55.0.69: ip-proto-1 ^C 16 packets captured 16 packets received by filter 0 packets dropped by kernel0 ✓ bgp01p ~# ip route get to 10.55.0.69 10.55.0.69 via 10.255.0.73 dev tunnel18 src 10.255.0.74 uid 0 cache expires 578sec mtu 1398
0 ✓ bgp01p ~# ip route get to 10.55.0.69 10.55.0.69 via 10.255.0.73 dev tunnel18 src 10.255.0.74 uid 0 cache expires 568sec mtu 1398
0 ✓ bgp01p ~# tcpdump -ni any host 10.55.0.69 tcpdump: verbose output suppressed, use -v or -vv for full protocol decode listening on any, link-type LINUX_SLL (Linux cooked), capture size 262144 bytes 19:51:47.771882 IP 172.16.97.2 > 10.55.0.69: ICMP echo request, id 7351, seq 1, length 1008 19:51:47.771899 IP 172.16.97.2 > 10.55.0.69: ICMP echo request, id 7351, seq 1, length 1008 19:51:48.780808 IP 172.16.97.2 > 10.55.0.69: ICMP echo request, id 7351, seq 2, length 1008 19:51:48.780835 IP 172.16.97.2 > 10.55.0.69: ICMP echo request, id 7351, seq 2, length 1008 ... 10 packets captured 10 packets received by filter 0 packets dropped by kernel 0 ✓ bgp01p ~# ip route get to 10.55.0.69 10.55.0.69 via 10.255.0.73 dev tunnel18 src 10.255.0.74 uid 0 cache expires 525sec mtu 1398
0 ✓ bgp01p ~# ip route get to 10.55.0.69 10.55.0.69 via 10.255.0.97 dev tunnel24 src 10.255.0.98 uid 0 cache
0 ✓ bgp01p ~# tcpdump -ni any host 10.55.0.69 tcpdump: verbose output suppressed, use -v or -vv for full protocol decode listening on any, link-type LINUX_SLL (Linux cooked), capture size 262144 bytes
19:50:51.407561 IP 172.16.97.2 > 10.55.0.69: ip-proto-1 19:50:52.407384 IP 172.16.97.2 > 10.55.0.69: ip-proto-1 19:50:53.407293 IP 172.16.97.2 > 10.55.0.69: ip-proto-1 19:50:54.407270 IP 172.16.97.2 > 10.55.0.69: ip-proto-1 ^C 4 packets captured 4 packets received by filter 0 packets dropped by kernel0 ✓ bgp01p ~# ip route get to 10.55.0.69 10.55.0.69 via 10.255.0.97 dev tunnel24 src 10.255.0.98 uid 0 cache
0 ✓ bgp01p ~# tcpdump -ni any host 10.55.0.69 tcpdump: verbose output suppressed, use -v or -vv for full protocol decode listening on any, link-type LINUX_SLL (Linux cooked), capture size 262144 bytes
19:51:47.780406 IP 172.16.97.2 > 10.55.0.69: ICMP echo request, id 7351, seq 1, length 1008
19:51:47.780420 IP 172.16.97.2 > 10.55.0.69: ICMP echo request, id 7351, seq 1, length 1008
19:51:48.789382 IP 172.16.97.2 > 10.55.0.69: ICMP echo request, id 7351, seq 2, length 1008
19:51:48.789393 IP 172.16.97.2 > 10.55.0.69: ICMP echo request, id 7351, seq 2, length 528
19:51:48.789407 IP 172.16.97.2 > 10.55.0.69: ip-proto-1 19:51:49.797294 IP 172.16.97.2 > 10.55.0.69: ICMP echo request, id 7351, seq 3, length 1008
19:51:49.797308 IP 172.16.97.2 > 10.55.0.69: ICMP echo request, id 7351, seq 3, length 528
19:51:49.797325 IP 172.16.97.2 > 10.55.0.69: ip-proto-1 19:51:50.797111 IP 172.16.97.2 > 10.55.0.69: ICMP echo request, id 7351, seq 4, length 1008
19:51:50.797146 IP 172.16.97.2 > 10.55.0.69: ICMP echo request, id 7351, seq 4, length 528
19:51:50.797177 IP 172.16.97.2 > 10.55.0.69: ip-proto-1 19:51:51.805354 IP 172.16.97.2 > 10.55.0.69: ICMP echo request, id 7351, seq 5, length 1008
19:51:51.805368 IP 172.16.97.2 > 10.55.0.69: ICMP echo request, id 7351, seq 5, length 528
19:51:51.805384 IP 172.16.97.2 > 10.55.0.69: ip-proto-1 ^C 14 packets captured 14 packets received by filter 0 packets dropped by kernel0 ✓ bgp01p ~# ip route get to 10.55.0.69 10.55.0.69 via 10.255.0.97 dev tunnel24 src 10.255.0.98 uid 0 cache expires 581sec mtu lock 552
tcpdump -ni any host 10.55.0.69 tcpdump: verbose output suppressed, use -v or -vv for full protocol decode listening on any, link-type LINUX_SLL (Linux cooked), capture size 262144 bytes 20:08:47.759564 IP 172.16.97.2 > 10.55.0.69: ip-proto-1 20:08:48.759505 IP 172.16.97.2 > 10.55.0.69: ip-proto-1 20:08:49.759729 IP 172.16.97.2 > 10.55.0.69: ip-proto-1 20:08:50.759803 IP 172.16.97.2 > 10.55.0.69: ip-proto-1 20:08:51.759760 IP 172.16.97.2 > 10.55.0.69: ip-proto-1 20:08:52.759729 IP 172.16.97.2 > 10.55.0.69: ip-proto-1 20:08:53.759644 IP 172.16.97.2 > 10.55.0.69: ip-proto-1 20:08:54.759501 IP 172.16.97.2 > 10.55.0.69: ip-proto-1 ^C 8 packets captured 8 packets received by filter 0 packets dropped by kernel
I don't understand why router on site 2 decrease to 552 for this route ...
Hello,
We made more tests this morning and i bring you interesting things.
In order to make the problem more understandable, let's considere this case : We would like to ping GRE interface via GRE tunnel between Site 2 and site 8 tunnels
Reminder : 10.255.0.98 is the site 2 router's gre ip to site 8 10.255.0.97 is the site 8 router's gre ip to site 2 GRE tunnel in encapsulated in VPN IPSEC tunnel.
ifconfig tunnel24 tunnel24: flags=209<UP,POINTOPOINT,RUNNING,NOARP> mtu 1476 inet 10.255.0.98 netmask 255.255.255.252 destination 10.255.0.98 inet6 fe80::5efe:a3a:5f04 prefixlen 64 scopeid 0x20 unspec 0A-3A-5F-04-6E-6F-00-00-00-00-00-00-00-00-00-00 txqueuelen 1000 (UNSPEC) RX packets 1119125 bytes 249852565 (249.8 MB) RX errors 0 dropped 0 overruns 0 frame 0 TX packets 1082029 bytes 62239223 (62.2 MB) TX errors 825 dropped 13218 overruns 0 carrier 0 collisions 0
cat /etc/network/interfaces.d/tunnel24.cfg auto tunnel24 iface tunnel24 inet static address 10.255.0.98 netmask 255.255.255.252 broadcast 10.255.0.99 up ifconfig tunnel24 pre-up ip tunnel add tunnel24 mode gre remote 10.55.0.4 local 10.58.95.4 ttl 64
ip route flush cache
ip route get to 10.255.0.97 10.255.0.97 dev tunnel24 src 10.255.0.98 uid 0 cache
ping 10.255.0.97 PING 10.255.0.97 (10.255.0.97) 56(84) bytes of data. 64 bytes from 10.255.0.97: icmp_seq=1 ttl=64 time=2.15 ms 64 bytes from 10.255.0.97: icmp_seq=2 ttl=64 time=2.27 ms 64 bytes from 10.255.0.97: icmp_seq=3 ttl=64 time=1.94 ms 64 bytes from 10.255.0.97: icmp_seq=4 ttl=64 time=2.04 ms 64 bytes from 10.255.0.97: icmp_seq=5 ttl=64 time=1.95 ms 64 bytes from 10.255.0.97: icmp_seq=6 ttl=64 time=1.94 ms 64 bytes from 10.255.0.97: icmp_seq=7 ttl=64 time=2.07 ms 64 bytes from 10.255.0.97: icmp_seq=8 ttl=64 time=2.11 ms 64 bytes from 10.255.0.97: icmp_seq=9 ttl=64 time=2.01 ms ^C --- 10.255.0.97 ping statistics --- 9 packets transmitted, 9 received, 0% packet loss, time 8009ms rtt min/avg/max/mdev = 1.945/2.058/2.275/0.110 ms
ip route get to 10.255.0.97 10.255.0.97 dev tunnel24 src 10.255.0.98 uid 0 cache expires 573sec mtu lock 552
We can see that MTU was automatically updated to 552
ping 10.255.0.97 -M do -s 1000 PING 10.255.0.97 (10.255.0.97) 1000(1028) bytes of data. ping: local error: Message too long, mtu=552 ping: local error: Message too long, mtu=552 ping: local error: Message too long, mtu=552 ^C --- 10.255.0.97 ping statistics --- 3 packets transmitted, 0 received, +3 errors, 100% packet loss, time 2023ms
ip route flush cache
ping 10.255.0.97 -M do -s 1000 PING 10.255.0.97 (10.255.0.97) 1000(1028) bytes of data. 1008 bytes from 10.255.0.97: icmp_seq=1 ttl=64 time=2.23 ms 1008 bytes from 10.255.0.97: icmp_seq=2 ttl=64 time=2.25 ms 1008 bytes from 10.255.0.97: icmp_seq=3 ttl=64 time=2.14 ms ping: local error: Message too long, mtu=942 ping: local error: Message too long, mtu=942 ping: local error: Message too long, mtu=942 ^C --- 10.255.0.97 ping statistics --- 7 packets transmitted, 3 received, +3 errors, 57% packet loss, time 6033ms rtt min/avg/max/mdev = 2.142/2.210/2.254/0.048 ms
Ping seems to work, but stop after some packets
0 ✓ [VSBB2]bgp01p ~# ip route get to 10.255.0.97 10.255.0.97 dev tunnel24 src 10.255.0.98 uid 0 cache expires 588sec mtu 942
MTU automatically set to 942
grep bgp /etc/frr/daemons bgpd=yes bgpd_options=" -A 127.0.0.1"
sed -i 's/bgpd=yes/bgpd=no/' /etc/frr/daemons
grep bgp /etc/frr/daemons bgpd=no bgpd_options=" -A 127.0.0.1"
systemctl restart frr
ip route get to 10.255.0.97 10.255.0.97 dev tunnel24 src 10.255.0.98 uid 0 cache expires 436sec mtu 942
ip route flush cache
ip route get to 10.255.0.97 10.255.0.97 dev tunnel24 src 10.255.0.98 uid 0 cache
ping 10.255.0.97 -M do -s 1000 PING 10.255.0.97 (10.255.0.97) 1000(1028) bytes of data. 1008 bytes from 10.255.0.97: icmp_seq=1 ttl=64 time=2.31 ms 1008 bytes from 10.255.0.97: icmp_seq=2 ttl=64 time=2.13 ms 1008 bytes from 10.255.0.97: icmp_seq=3 ttl=64 time=2.43 ms 1008 bytes from 10.255.0.97: icmp_seq=4 ttl=64 time=2.61 ms 1008 bytes from 10.255.0.97: icmp_seq=5 ttl=64 time=2.20 ms 1008 bytes from 10.255.0.97: icmp_seq=6 ttl=64 time=2.37 ms 1008 bytes from 10.255.0.97: icmp_seq=7 ttl=64 time=2.56 ms 1008 bytes from 10.255.0.97: icmp_seq=8 ttl=64 time=2.16 ms 1008 bytes from 10.255.0.97: icmp_seq=9 ttl=64 time=2.20 ms 1008 bytes from 10.255.0.97: icmp_seq=10 ttl=64 time=2.04 ms 1008 bytes from 10.255.0.97: icmp_seq=11 ttl=64 time=2.15 ms 1008 bytes from 10.255.0.97: icmp_seq=12 ttl=64 time=2.12 ms 1008 bytes from 10.255.0.97: icmp_seq=13 ttl=64 time=2.22 ms ^C --- 10.255.0.97 ping statistics --- 13 packets transmitted, 13 received, 0% packet loss, time 12015ms rtt min/avg/max/mdev = 2.044/2.273/2.613/0.179 ms
ip route get to 10.255.0.97 10.255.0.97 dev tunnel24 src 10.255.0.98 uid 0 cache expires 582sec mtu 1398
MTU seems to be normal : 1398 bytes
sed -i 's/bgpd=no/bgpd=yes/' /etc/frr/daemons && grep bgpd /etc/frr/daemons && systemctl restart frr bgpd=yes bgpd_options=" -A 127.0.0.1"
ip route flush cache
ip route get to 10.255.0.97 10.255.0.97 dev tunnel24 src 10.255.0.98 uid 0 cache
ping 10.255.0.97 -M do -s 1000 PING 10.255.0.97 (10.255.0.97) 1000(1028) bytes of data. ping: local error: Message too long, mtu=552 ping: local error: Message too long, mtu=552 ^C --- 10.255.0.97 ping statistics --- 3 packets transmitted, 0 received, +2 errors, 100% packet loss, time 2040ms
ip route get to 10.255.0.97 10.255.0.97 dev tunnel24 src 10.255.0.98 uid 0 cache expires 593sec mtu lock 552
Sometimes, host received himself forged packet with icmp_seq=1 Frag needed and DF set (mtu = 0)
ping -M do -s 501 10.255.0.97 PING 10.255.0.97 (10.255.0.97) 501(529) bytes of data. From 10.255.0.98 icmp_seq=1 Frag needed and DF set (mtu = 0) ^C --- 10.255.0.97 ping statistics --- 1 packets transmitted, 0 received, +1 errors, 100% packet loss, time 0ms
To sum up, MTU issue appear when bgpd daemon is enable. Any idea ?
Thanks for you help
Hi,
Does someone have idea about that ? We want to put it into production but this problem prevents us from doing it.
Thanks for help ;)
Hello,
I did tests with quagga and it's works on my case. Thanks to everyone helped on this case.
Yeah, still no idea what the problem is here. Only thing we can think of is that your BGP updates might be triggering (kernel) PTMUD, which then adjusts your link MTU. FRR generally produces larger BGP updates than Quagga, as a consequence of improved advertisement efficiency, so that might explain why it works for you under Quagga. I'm not aware of any channels in BGP that could directly (i.e., as a protocol mechanism) interface MTU, though.
You could try disabling PTMUD and see if you notice a difference.
https://www.kernel.org/doc/Documentation/networking/ip-sysctl.txt
ip_no_pmtu_disc - INTEGER
Disable Path MTU Discovery. If enabled in mode 1 and a
fragmentation-required ICMP is received, the PMTU to this
destination will be set to min_pmtu (see below). You will need
to raise min_pmtu to the smallest interface MTU on your system
manually if you want to avoid locally generated fragments.
In mode 2 incoming Path MTU Discovery messages will be
discarded. Outgoing frames are handled the same as in mode 1,
implicitly setting IP_PMTUDISC_DONT on every created socket.
Mode 3 is a hardened pmtu discover mode. The kernel will only
accept fragmentation-needed errors if the underlying protocol
can verify them besides a plain socket lookup. Current
protocols for which pmtu events will be honored are TCP, SCTP
and DCCP as they verify e.g. the sequence number or the
association. This mode should not be enabled globally but is
only intended to secure e.g. name servers in namespaces where
TCP path mtu must still work but path MTU information of other
protocols should be discarded. If enabled globally this mode
could break other protocols.
Possible values: 0-3
Default: FALSE
Describe the bug
First of all, i'll explain my topology. I have 3 sites with 2 routers per site (In fact we have 9 sites but in this case only 3 sites are required to understand my problem). Orange router is the main per site (default gateway for server), and the grey one is backup router (via VRRP protocol) On the figure, each green links are IPSEC VPN tunnels mounted with StrongSwan. In those tunnels, there is a GRE interface which mount GRE tunnel between nodes on each side of the VPN IPSEC (tunnels and GRE IP are in black on figure). Each node have Loopback address (IP in blue on figure). Of course, each node have LAN ip (IP in green on figure). Each node also have BGP configuration, to announce the routables networks. I have weird issue with MTU when 2 servers need to communicate through this infrastructure. When i'm pinging server 10.55.0.69 from 172.16.97.2 (Black servers on figure), everything works correctly and i'm able to see traffic going into and leave routers.
On orange router on site 8 (closest to destination server) i did this capture (everything is ok):
Now, i'm sending same ping, with don't fragment instruction and 1300bytes from 172.16.97.2 server
Network capture on orange router on site 7
Network capture on orange router on site 2
Get route and MTU on orange router on Site 7
Get route and MTU on orange router on Site 2
Let's try to reduce the ping size to 800 bytes (because of the cache expires 329sec mtu 894 in previous command)
Network capture on orange router on site 7
Network capture on orange router on site 2
Strangely, MTU decreased from 894 to 606 bytes
Also, during another test i had weird thing (it seems that MTU was dynamically updated ...):
Here is FRR configuration of orange router on SITE 2:
Here is FRR configuration of orange router on SITE 7:
My question is : Why the MTU is dynamically decreased like that ? I'm aware there is overhead on GRE encapsulation but that's not explain why it's decreased so much I hope that i provide enough information to troubleshoot my issue :)
Thanks in advance
(put "x" in "[ ]" if you already tried following) [x] Did you check if this is a duplicate issue? [ ] Did you test it on the latest FRRouting/frr master branch?
Versions