Cannot send packets in one direction (other works) #78

Open mirabilos opened 2 years ago

mirabilos commented 2 years ago

Downstream bugtracker link: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1016129

I have a virtual machine on a host-only network. The configuration is thus:

host$ ip a show dev virbr1
23: virbr1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
    link/ether 52:54:00:76:0b:a6 brd ff:ff:ff:ff:ff:ff
    inet brd scope global virbr1
       valid_lft forever preferred_lft forever
    inet6 fec0::1/64 scope site 
       valid_lft forever preferred_lft forever
    inet6 fe80::5054:ff:fe76:ba6/64 scope link 
       valid_lft forever preferred_lft forever
guest$ ip a show dev eth1
3: eth1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc jhtb state UP group default qlen 1000
    link/ether 52:54:00:b7:47:b9 brd ff:ff:ff:ff:ff:ff
    inet brd scope global eth1
       valid_lft forever preferred_lft forever
    inet6 fec0::2/64 scope site 
       valid_lft forever preferred_lft forever
    inet6 fe80::5054:ff:feb7:47b9/64 scope link 
       valid_lft forever preferred_lft forever

I can do this:

guest$ sudo tcp6 -i eth1 -y 1000 -d fec0::1 -a 22 -P 1200

But I cannot send packets in the other direction:

host$ sudo tcp6 -i virbr1 -y 1000 -d fec0::2 -a 22 -P 1200
Error while performing Neighbor Discovery for the Destination Address
Error while learning Souce Address and Next Hop
host$ sudo tcp6 -i virbr1 -y 1000 -s fec0::1 -d fec0::2 -a 22 -P 1200 -S 52:54:00:76:0b:a6 -D 52:54:00:b7:47:b9
Error while performing Neighbor Discovery for the Destination Address
Error while learning Souce Address and Next Hop
host$ sudo tcp6 -i virbr1 -y 1000 -s fe80::5054:ff:fe76:ba6 -d fe80::5054:ff:feb7:47b9 -a 22 -P 1200 -S 52:54:00:76:0b:a6 -D 52:54:00:b7:47:b9
Error while performing Neighbor Discovery for the Destination Address
Error while learning Souce Address and Next Hop

Misspelt error messages aside, why is this, even if I pass the source address? Neighbour discovery should work because…

guest$ ping6 -c 1 fec0::1
PING fec0::1(fec0::1) 56 data bytes
64 bytes from fec0::1: icmp_seq=1 ttl=64 time=0.182 ms

--- fec0::1 ping statistics ---
1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 0.182/0.182/0.182/0.000 ms
host$ ping6 -c 1 fec0::2
PING fec0::2(fec0::2) 56 data bytes
64 bytes from fec0::2: icmp_seq=1 ttl=64 time=0.384 ms

--- fec0::2 ping statistics ---
1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 0.384/0.384/0.384/0.000 ms

It also doesn’t work when using link-local addresses, see the last attempt above.

mirabilos commented 2 years ago

It stopped working in the other direction as well, after I upgraded the VM from buster to bullseye. I suspect this upgrade to be at fault.

Downgrading just the ipv6toolkit binary package does not fix the problem.

mirabilos commented 2 years ago

Booting with the buster kernel, Linux 4.19.0-21-amd64 (4.19.249-2), does not magically fix this either.

mirabilos commented 2 years ago

Incidentally… on the host system running bullseye, switching to a buster chroot does work!

host:~ $ sudo tcp6 -i virbr1 -d fec0::2 -y 500 -a 22 -P 600; echo = $?
Error while performing Neighbor Discovery for the Destination Address
Error while learning Souce Address and Next Hop
= 1
host:~ $ schroot -prc buster
(buster-i386)host:~ $ sudo tcp6 -i virbr1 -d fec0::2 -y 500 -a 22 -P 600; echo = $?
= 0
mirabilos commented 2 years ago

mirabilos commented 2 years ago

Cc libpcap maintainers; context is #1016129 on debbugs.

From diffing around between a buster and a bullseye system, I could track this bug down:

libpcap0.8: 1.10.0-2 500 500 http://deb.debian.org/debian bullseye/main amd64 Packages 1.10.0-2~bpo10+1 100 100 http://deb.debian.org/debian buster-backports/main amd64 Packages 1.8.1-6+deb10u1 500 500 http://deb.debian.org/debian buster/main amd64 Packages

Installing 1.10.0-2~bpo10+1 on buster breaks ipv6toolkit.

Installing 1.10.0-2~bpo10+1 on bullseye does not fix ipv6toolkit, installing 1.8.1-6+deb10u1 on bullseye (which temporarily breaks tcpdump) does fix ipv6toolkit.

So there’s either something in the newer libpcap that breaks ipv6toolkit, or something in ipv6toolkit that’s not yet compatible with newer libpcap.

guyharris commented 2 years ago

What happens if you run with -vv which, it appears, will cause the programs to report errors from libpcap calls?

(And why are those not ALWAYS reported? "Error while performing Neighbor Discovery for the Destination Address" and "Error while learning Souce Address and Next Hop" amount to "Something bad happened, but I'm not going to tell you what it was", which aren't very helpful if you want to try to make something bad not happen.)

mirabilos commented 2 years ago

Guy Harris dixit:

What happens if you run with -vv which, it appears, will cause the programs to report errors from libpcap calls?

Not more, unfortunately:

$ sudo tcp6 -i eth1 -d fec0::1 -y 500 -a 13 -P 600 -v -v -v -v -v Error while performing Neighbor Discovery for the Destination Address Error while learning Souce Address and Next Hop

Would an strace (3.6 MB) be useful, or do you prefer testing this on your own Debian system/VM?

mcr commented 2 years ago

On the bullseye kernel, a libpcap 1.8 works? I'm curious if on a buster kernel, a libpcap 1.10 works? It might also matter which kernel capture mechanism is compiled it.

mirabilos commented 2 years ago

Michael Richardson dixit:

On the bullseye kernel, a libpcap 1.8 works?


I'm curious if on a buster kernel, a libpcap 1.10 works?


guyharris commented 2 years ago

Not more, unfortunately:

So either 1) running with four -vs doesn't cause idata->verbose_f in ipv6_to_ether() to be set to a value > 1 or 2) foundaddr isn't set to "true" at the end of ipv6_to_ether().

If it's 1), that would imply unlikely brokenness, so I'll assume it's 2), and, therefore, that no libpcap call reported an error.

That means that the Neighbor Advertisement packet wasn't seen, either because 1) it wasn't sent, 2) it wasn't received, 3) it was received but didn't make it to the PF_PACKET socket or was dropped before the filtering, or 4) it was received, made it to the PF_PACKET socket, but didn't pass the capture filter.

Did you run tcpdump or Wireshark while running tcp6, to see what traffic was sent by and received on the machine running tcp6?

mirabilos commented 2 years ago

Guy Harris dixit:

Did you run tcpdump or Wireshark while running tcp6, to see what traffic was sent by and received on the machine running tcp6?

I didn’t, but I can easily do that. I’m using…

sudo tcp6 -i eth1 -d fec0::1 -y 500 -a 13 -P 600

… as the test command, which I know used to work, since there were quite a few I’ve tried, so to be clear which I use.

Note the 23:43:58 ones came in just as I pressed ^C and definitely past the runtime of the tcp6 command. The link “should” be otherwise idle, but things like Avahi are bound to create background traffic.

$ sudo tcpdump -evvlns 2000 -Xi eth1 tcpdump: listening on eth1, link-type EN10MB (Ethernet), snapshot length 2000 bytes 23:43:53.682817 52:54:00:b7:47:b9 > 33:33:ff:00:00:01, ethertype IPv6 (0x86dd), length 86: (hlim 255, next-header ICMPv6 (58) payload length: 32) fe80::5054:ff:feb7:47b9 > ff02::1:ff00:1: [icmp6 sum ok] ICMP6, neighbor solicitation, length 32, who has fec0::1 source link-address option (1), length 8 (1): 52:54:00:b7:47:b9 0x0000: 5254 00b7 47b9 0x0000: 6000 0000 0020 3aff fe80 0000 0000 0000 .....:......... 0x0010: 5054 00ff feb7 47b9 ff02 0000 0000 0000 PT....G......... 0x0020: 0000 0001 ff00 0001 8700 49d2 0000 0000 ..........I..... 0x0030: fec0 0000 0000 0000 0000 0000 0000 0001 ................ 0x0040: 0101 5254 00b7 47b9 ..RT..G. 23:43:53.683167 52:54:00:76:0b:a6 > 52:54:00:b7:47:b9, ethertype IPv6 (0x86dd), length 86: (hlim 255, next-header ICMPv6 (58) payload length: 32) fec0::1 > fe80::5054:ff:feb7:47b9: [icmp6 sum ok] ICMP6, neighbor advertisement, length 32, tgt is fec0::1, Flags [router, solicited, override] destination link-address option (2), length 8 (1): 52:54:00:76:0b:a6 0x0000: 5254 0076 0ba6 0x0000: 6000 0000 0020 3aff fec0 0000 0000 0000.....:......... 0x0010: 0000 0000 0000 0001 fe80 0000 0000 0000 ................ 0x0020: 5054 00ff feb7 47b9 8800 a369 e000 0000 PT....G....i.... 0x0030: fec0 0000 0000 0000 0000 0000 0000 0001 ................ 0x0040: 0201 5254 0076 0ba6 ..RT.v.. 23:43:54.682914 52:54:00:b7:47:b9 > 33:33:ff:00:00:01, ethertype IPv6 (0x86dd), length 86: (hlim 255, next-header ICMPv6 (58) payload length: 32) fe80::5054:ff:feb7:47b9 > ff02::1:ff00:1: [icmp6 sum ok] ICMP6, neighbor solicitation, length 32, who has fec0::1 source link-address option (1), length 8 (1): 52:54:00:b7:47:b9 0x0000: 5254 00b7 47b9 0x0000: 6000 0000 0020 3aff fe80 0000 0000 0000 .....:......... 0x0010: 5054 00ff feb7 47b9 ff02 0000 0000 0000 PT....G......... 0x0020: 0000 0001 ff00 0001 8700 49d2 0000 0000 ..........I..... 0x0030: fec0 0000 0000 0000 0000 0000 0000 0001 ................ 0x0040: 0101 5254 00b7 47b9 ..RT..G. 23:43:54.683161 52:54:00:76:0b:a6 > 52:54:00:b7:47:b9, ethertype IPv6 (0x86dd), length 86: (hlim 255, next-header ICMPv6 (58) payload length: 32) fec0::1 > fe80::5054:ff:feb7:47b9: [icmp6 sum ok] ICMP6, neighbor advertisement, length 32, tgt is fec0::1, Flags [router, solicited, override] destination link-address option (2), length 8 (1): 52:54:00:76:0b:a6 0x0000: 5254 0076 0ba6 0x0000: 6000 0000 0020 3aff fec0 0000 0000 0000.....:......... 0x0010: 0000 0000 0000 0001 fe80 0000 0000 0000 ................ 0x0020: 5054 00ff feb7 47b9 8800 a369 e000 0000 PT....G....i.... 0x0030: fec0 0000 0000 0000 0000 0000 0000 0001 ................ 0x0040: 0201 5254 0076 0ba6 ..RT.v.. 23:43:55.682967 52:54:00:b7:47:b9 > 33:33:ff:00:00:01, ethertype IPv6 (0x86dd), length 86: (hlim 255, next-header ICMPv6 (58) payload length: 32) fe80::5054:ff:feb7:47b9 > ff02::1:ff00:1: [icmp6 sum ok] ICMP6, neighbor solicitation, length 32, who has fec0::1 source link-address option (1), length 8 (1): 52:54:00:b7:47:b9 0x0000: 5254 00b7 47b9 0x0000: 6000 0000 0020 3aff fe80 0000 0000 0000 .....:......... 0x0010: 5054 00ff feb7 47b9 ff02 0000 0000 0000 PT....G......... 0x0020: 0000 0001 ff00 0001 8700 49d2 0000 0000 ..........I..... 0x0030: fec0 0000 0000 0000 0000 0000 0000 0001 ................ 0x0040: 0101 5254 00b7 47b9 ..RT..G. 23:43:55.683294 52:54:00:76:0b:a6 > 52:54:00:b7:47:b9, ethertype IPv6 (0x86dd), length 86: (hlim 255, next-header ICMPv6 (58) payload length: 32) fec0::1 > fe80::5054:ff:feb7:47b9: [icmp6 sum ok] ICMP6, neighbor advertisement, length 32, tgt is fec0::1, Flags [router, solicited, override] destination link-address option (2), length 8 (1): 52:54:00:76:0b:a6 0x0000: 5254 0076 0ba6 0x0000: 6000 0000 0020 3aff fec0 0000 0000 0000.....:......... 0x0010: 0000 0000 0000 0001 fe80 0000 0000 0000 ................ 0x0020: 5054 00ff feb7 47b9 8800 a369 e000 0000 PT....G....i.... 0x0030: fec0 0000 0000 0000 0000 0000 0000 0001 ................ 0x0040: 0201 5254 0076 0ba6 ..RT.v.. 23:43:58.805988 52:54:00:76:0b:a6 > 52:54:00:b7:47:b9, ethertype IPv6 (0x86dd), length 86: (hlim 255, next-header ICMPv6 (58) payload length: 32) fe80::5054:ff:fe76:ba6 > fe80::5054:ff:feb7:47b9: [icmp6 sum ok] ICMP6, neighbor solicitation, length 32, who has fe80::5054:ff:feb7:47b9 source link-address option (1), length 8 (1): 52:54:00:76:0b:a6 0x0000: 5254 0076 0ba6 0x0000: 6000 0000 0020 3aff fe80 0000 0000 0000 .....:......... 0x0010: 5054 00ff fe76 0ba6 fe80 0000 0000 0000 PT...v.......... 0x0020: 5054 00ff feb7 47b9 8700 92b7 0000 0000 PT....G......... 0x0030: fe80 0000 0000 0000 5054 00ff feb7 47b9 ........PT....G. 0x0040: 0101 5254 0076 0ba6 ..RT.v.. 23:43:58.806029 52:54:00:b7:47:b9 > 52:54:00:76:0b:a6, ethertype IPv6 (0x86dd), length 78: (hlim 255, next-header ICMPv6 (58) payload length: 24) fe80::5054:ff:feb7:47b9 > fe80::5054:ff:fe76:ba6: [icmp6 sum ok] ICMP6, neighbor advertisement, length 24, tgt is fe80::5054:ff:feb7:47b9, Flags [solicited] 0x0000: 6000 0000 0018 3aff fe80 0000 0000 0000.....:......... 0x0010: 5054 00ff feb7 47b9 fe80 0000 0000 0000 PT....G......... 0x0020: 5054 00ff fe76 0ba6 8800 b130 4000 0000 @.*** 0x0030: fe80 0000 0000 0000 5054 00ff feb7 47b9 ........PT....G. ^C 8 packets captured 8 packets received by filter 0 packets dropped by kernel

guyharris commented 2 years ago

OK, try that again, but with tcpdump running with a filter of "icmp6 and ip6[7]==255 and ip6[40]==136 and ip6[41]==0".

mirabilos commented 2 years ago

@guyharris here:

$ sudo tcpdump -evvlns 2000 -Xi eth1 "icmp6 and ip6[7]==255 and ip6[40]==136 and ip6[41]==>
tcpdump: listening on eth1, link-type EN10MB (Ethernet), snapshot length 2000 bytes
00:09:25.590900 52:54:00:76:0b:a6 > 52:54:00:b7:47:b9, ethertype IPv6 (0x86dd), length 86: (hlim 255, next-header ICMPv6 (58) payload length: 32) fec0::1 > fe80::5054:ff:feb7:47b9: [icmp6 sum ok] ICMP6, neighbor advertisement, length 32, tgt is fec0::1, Flags [router, solicited, override]
          destination link-address option (2), length 8 (1): 52:54:00:76:0b:a6
            0x0000:  5254 0076 0ba6
        0x0000:  6000 0000 0020 3aff fec0 0000 0000 0000  `.....:.........
        0x0010:  0000 0000 0000 0001 fe80 0000 0000 0000  ................
        0x0020:  5054 00ff feb7 47b9 8800 a369 e000 0000  PT....G....i....
        0x0030:  fec0 0000 0000 0000 0000 0000 0000 0001  ................
        0x0040:  0201 5254 0076 0ba6                      ..RT.v..
00:09:26.591051 52:54:00:76:0b:a6 > 52:54:00:b7:47:b9, ethertype IPv6 (0x86dd), length 86: (hlim 255, next-header ICMPv6 (58) payload length: 32) fec0::1 > fe80::5054:ff:feb7:47b9: [icmp6 sum ok] ICMP6, neighbor advertisement, length 32, tgt is fec0::1, Flags [router, solicited, override]
          destination link-address option (2), length 8 (1): 52:54:00:76:0b:a6
            0x0000:  5254 0076 0ba6
        0x0000:  6000 0000 0020 3aff fec0 0000 0000 0000  `.....:.........
        0x0010:  0000 0000 0000 0001 fe80 0000 0000 0000  ................
        0x0020:  5054 00ff feb7 47b9 8800 a369 e000 0000  PT....G....i....
        0x0030:  fec0 0000 0000 0000 0000 0000 0000 0001  ................
        0x0040:  0201 5254 0076 0ba6                      ..RT.v..
00:09:27.591131 52:54:00:76:0b:a6 > 52:54:00:b7:47:b9, ethertype IPv6 (0x86dd), length 86: (hlim 255, next-header ICMPv6 (58) payload length: 32) fec0::1 > fe80::5054:ff:feb7:47b9: [icmp6 sum ok] ICMP6, neighbor advertisement, length 32, tgt is fec0::1, Flags [router, solicited, override]
          destination link-address option (2), length 8 (1): 52:54:00:76:0b:a6
            0x0000:  5254 0076 0ba6
        0x0000:  6000 0000 0020 3aff fec0 0000 0000 0000  `.....:.........
        0x0010:  0000 0000 0000 0001 fe80 0000 0000 0000  ................
        0x0020:  5054 00ff feb7 47b9 8800 a369 e000 0000  PT....G....i....
        0x0030:  fec0 0000 0000 0000 0000 0000 0000 0001  ................
        0x0040:  0201 5254 0076 0ba6                      ..RT.v..
3 packets captured
3 packets received by filter
0 packets dropped by kernel
mcr commented 2 years ago

It sounds like if we have the capture that we can build a regression test that shows the different between 1.8 and 1.10, or are there kernel issues I'm missing?

guyharris commented 2 years ago

tcpdump with the filter - which is also the filter used by ipv6tools when looking for a Neighbor Advertisement - saw the Neighbor Advertisements, so the filter doesn't seem to be the problem.

Presumably the version of ipv6tools used here has commit fgont/ipv6toolkit@03b0fdd42cf36c0070472afbb9b81a9ca62e1109 from 2020, so it's not providing a timeout of 0 to pcap_open_live(). (If it doesn't have that commit, that's a bug that could cause the code not to work with libpcap 1.10 on Linux, as 1.10 has a commit to fix the behavior of a timeout of 0 to match the documentation.)

mirabilos commented 2 years ago

Gotcha. It doesn’t:


I’ll test locally if that fixes the issue; if so, it’s easy to backport, and I’ll take that to the maintainer.

Updating ipv6toolkit in distros hits quite the snag in that ipv6toolkit upstream doesn’t publish released versions any more, apparently :/

Thank you for the debugging! Much appreciated.

guyharris commented 2 years ago

By the way, if you're willing to require libpcap 1.5 or later:

The various packet batching mechanisms are oriented towards packet capture, where immediate delivery isn't a priority but reducing per-packet overhead is, so they accumulate a collection of packets and deliver them with one wakeup per collection (and, for non-memory-mapped capture, one copy per collection) rather than one per packet.

What you're doing is implementing a tiny bit of ICMPv6 and directly sending packets on, and receiving packets from, a network interface; for that, you want immediate delivery and don't expect to get a huge number of packets, so reducing per-packet overhead isn't as important.

You could use the pcap_create()/pcap_activate() API - create a handle with pcap_create() set immediate mode with pcap_set_immediate_mode(), set any other parameters that are relevant (note that the timeout is not relevant in immediate mode, so you needn't set it), and activate the handle with pcap_activate().

mcr commented 2 years ago

Updating ipv6toolkit in distros hits quite the snag in that ipv6toolkit upstream doesn’t publish released versions any more, apparently :/

@fgont is seen regularly at IETF, so I'll bug him this week.

fgont commented 2 years ago

Apologies for the delay in getting back to you guys (shame on me!). I'll take a look at this tommorrow.

mirabilos commented 2 years ago

Upstream discussion managed to find the precise fix for pcap 1.10 compatibility that’s missing in Debian’s version: https://github.com/fgont/ipv6toolkit/issues/78#issuecomment-1197453179

debdiff (±version) attached. Would you prefer for me to NMU, do a maintainer-agreed regular upload (as -2), or handle this yourself, Octavio?

Tagging bullseye and buster because this must be fixed in these releases as well:

• the bullseye package, as-is, is broken, so this is an RC fix there • buster “as-is” works but if libpcap0.8 is upgraded, either via buster-backports or by mixing buster and bullseye packages, it’ll break (and we cannot retrofit a matching Breaks to libpcap0.8 in bullseye any more now it’s released) so buster’s will either need the patch applied or a versioned depends on libpcap0.8 (<< 1.10)

Will you communicate with the SRM or do you wish for me to handle that?

Thanks in advance, //mirabilos -- [17:15:07] Lukas Degener: Kleines Asterix-Latinum für Softwaretechniker: veni, vidi, fixi(t) ;-) diff -Nru ipv6toolkit-2.0+ds.1/debian/changelog ipv6toolkit-2.0+ds.1/debian/changelog --- ipv6toolkit-2.0+ds.1/debian/changelog 2020-08-05 06:21:55.000000000 +0200 +++ ipv6toolkit-2.0+ds.1/debian/changelog 2022-07-31 21:43:23.000000000 +0200 @@ -1,3 +1,10 @@ +ipv6toolkit (2.0+ds.1-1.1~~) UNRELEASED; urgency=medium +

mirabilos commented 2 years ago

On 31/07/22 14:57, Thorsten Glaser wrote:

debdiff (±version) attached. Would you prefer for me to NMU, do a maintainer-agreed regular upload (as -2), or handle this yourself, Octavio?

I can handle it. By the way, did the fix work for you?

• the bullseye package, as-is, is broken, so this is an RC fix there • buster “as-is” works but if libpcap0.8 is upgraded, either via buster-backports or by mixing buster and bullseye packages, it’ll break (and we cannot retrofit a matching Breaks to libpcap0.8 in bullseye any more now it’s released) so buster’s will either need the patch applied or a versioned depends on libpcap0.8 (<< 1.10)

The proposed fix should work with both libpcap versions, so that should be the way to go. I'll take a look at that too.

Will you communicate with the SRM or do you wish for me to handle that?

I may need help with that. I'll get back to you.

Thanks, Octavio.

mirabilos commented 2 years ago

Octavio Alvarez dixit:

On 31/07/22 14:57, Thorsten Glaser wrote:

debdiff (±version) attached. Would you prefer for me to NMU, do a maintainer-agreed regular upload (as -2), or handle this yourself, Octavio?

I can handle it. By the way, did the fix work for you?

Yes, the debdiff I attached works.

Will you communicate with the SRM or do you wish for me to handle that?

I may need help with that. I'll get back to you.


