SIDN / spin

SPIN Core Software
https://spin.sidnlabs.nl
GNU General Public License v2.0
78 stars 9 forks source link

spin-pcap-reader crash #92

Open frankvandenhurk opened 2 years ago

frankvandenhurk commented 2 years ago
# /mnt/data/spin/spin-pcap-reader -E <IP> -i <interface> -s 9000
spin-pcap-reader: caplen 9000 != len 10278,
spin-pcap-reader: Truncated IP packet: 1278 bytes missing
spin-pcap-reader: caplen 9000 != len 11738,
spin-pcap-reader: Truncated IP packet: 2738 bytes missing
spin-pcap-reader: caplen 9000 != len 17578,
spin-pcap-reader: Truncated IP packet: 8578 bytes missing
00: 02 04 05 b4
00:
00: 00 0e 01 00 00 01 00 00 00 00
10: 00 01 06 67 6f 6f 67 6c 65 03
20: 63 6f 6d 00 00 01 00 01 01 00
30: 29 10 00 00 00 00 00 00 0c 00
40: 0a 00 08 12 1b ad eb 9c e7 7e
50: 55
cschutijser commented 2 years ago

Those lengths/MTUs don't make sense to me at all so there must be some other problem.

You're running spin-pcap-reader on Ubiquity equipment, right? Are you cross-compiling? If so, can you give some details? Perhaps something fishy is going on there and perhaps based on those details I can find out why this is happening. Don't have much experience with cross-compilation but always happy to learn and have a brief look.

frankvandenhurk commented 2 years ago

I started trying to cross-compile but I gave up and bought a Raspberry 4 and installed the 64-bit version of the Raspberry Pi OS. My Unifi Deam Machine Pro has a ARM 64-bit Cortex-A57 and my Raspberry 5 has a ARM 64-bit Cortex A72. Besides the compiled spin-pacp-reader I had to copy four libraries to the Unifi UDM Pro:

libcrypto.so.1.1 libldns.so.3 libpcap.so.0.8 libssl.so.1.1

cschutijser commented 2 years ago

My first idea to debug this further would be to run tcpdump on the machine that runs spin-pcap-reader, if possible. Remember again to specify the -s flag. Also use the -f flag to filter packets greater than the MTU we expect: -f 'len > 1518'

If tcpdump doesn't log any packets while spin-pcap-reader does, perhaps spin-pcap-reader is doing something wrong. If they both show packets, I'd really like to know what packets you see (of course you don't have to share the exact contents with me but some idea of what they are would be nice).

frankvandenhurk commented 2 years ago

The Unifi UDM Pro has different kind of interfaces. When I look at the ifconfig output, I can see:

VLAN 101 is my client VLAN. I've checked eth8.100, switch0, switch0.101, br101 and they all show a lot of packets wit len > 1518 in tcpdump.

If start "tcpdump -f 'len > 1518' -i br101 -w /mnt/data/tcpdump/bigpackets/br101.pcap", browse a website and open it in wireshark

image

I removed the source column, all packets have het internal IP of my workstation als source). Does this help?

cschutijser commented 2 years ago

Hey, I had a look at this again and figured out (I think) what's going on here. This is probably caused by segmentation offloading. As you can see on that page, the way it manifests itself is quite similar. A colleague found that the Unifi UDM Pro uses a Qualcomm chipset and some hints that indicate that Qualcomm chips indeed have this feature, which is another indication that segmentation offloading might cause this.

As shown on the Wireshark web page, we can use ethtool to show the supported offload mechanisms on a network interface. Can you run ethtool -k <interface> (-k is the same as --show-offload) to see whether segmentation offloading is indeed supported and enabled? If you could paste the output, that would be nice.

If segmentation offloading indeed is enabled, we at least have an answer as to why this is happening. The next step is to think about how to address this. The only reason we need to capture more than just the packet headers is that we want to analyse the contents of DNS packets. I've looked at some data on the size of DNS packets. In those measurements, DNS packets (almost) never seem to exceed 2000 bytes. And still, even in those cases, the question is whether spin-pcap-reader will actually see a DNS packet of 2000 bytes or just a fragmented part of it. I'll think about this some more.

frankvandenhurk commented 2 years ago

ethtool output eth8 (switch0 is the same)

# ethtool -k eth8
Features for eth8:
rx-checksumming: on
tx-checksumming: on
        tx-checksum-ipv4: on
        tx-checksum-ip-generic: off [fixed]
        tx-checksum-ipv6: on
        tx-checksum-fcoe-crc: off [fixed]
        tx-checksum-sctp: off [fixed]
scatter-gather: on
        tx-scatter-gather: on
        tx-scatter-gather-fraglist: off [fixed]
tcp-segmentation-offload: on
        tx-tcp-segmentation: on
        tx-tcp-ecn-segmentation: on
        tx-tcp-mangleid-segmentation: off
        tx-tcp6-segmentation: on
udp-fragmentation-offload: off
generic-segmentation-offload: on
generic-receive-offload: on
large-receive-offload: off [fixed]
rx-vlan-offload: on [fixed]
tx-vlan-offload: off [fixed]
ntuple-filters: on
receive-hashing: on
highdma: on
rx-vlan-filter: off [fixed]
vlan-challenged: off [fixed]
tx-lockless: off [fixed]
netns-local: off [fixed]
tx-gso-robust: off [fixed]
tx-fcoe-segmentation: off [fixed]
tx-gre-segmentation: off [fixed]
tx-gre-csum-segmentation: off [fixed]
tx-ipxip4-segmentation: off [fixed]
tx-ipxip6-segmentation: off [fixed]
tx-udp_tnl-segmentation: off [fixed]
tx-udp_tnl-csum-segmentation: off [fixed]
tx-gso-partial: off [fixed]
tx-sctp-segmentation: off [fixed]
tx-esp-segmentation: off [fixed]
tx-udp-segmentation: off [fixed]
fcoe-mtu: off [fixed]
tx-nocache-copy: off
loopback: off [fixed]
rx-fcs: off [fixed]
rx-all: off [fixed]
tx-vlan-stag-hw-insert: off [fixed]
rx-vlan-stag-hw-parse: off [fixed]
rx-vlan-stag-filter: off [fixed]
l2-fwd-offload: off [fixed]
hw-tc-offload: off [fixed]
esp-hw-offload: off [fixed]
esp-tx-csum-hw-offload: off [fixed]
rx-udp_tunnel-port-offload: off [fixed]
tls-hw-tx-offload: off [fixed]
tls-hw-rx-offload: off [fixed]
rx-gro-hw: off [fixed]
tls-hw-record: off [fixed]

ethtool output eth8.100 (switch0.101 is the same)

# ethtool -k eth8.100
Features for eth8.100:
rx-checksumming: off [fixed]
tx-checksumming: on
        tx-checksum-ipv4: off [fixed]
        tx-checksum-ip-generic: on
        tx-checksum-ipv6: off [fixed]
        tx-checksum-fcoe-crc: off [requested on]
        tx-checksum-sctp: off [requested on]
scatter-gather: on
        tx-scatter-gather: on
        tx-scatter-gather-fraglist: off [requested on]
tcp-segmentation-offload: on
        tx-tcp-segmentation: on
        tx-tcp-ecn-segmentation: on
        tx-tcp-mangleid-segmentation: on
        tx-tcp6-segmentation: on
udp-fragmentation-offload: off
generic-segmentation-offload: on
generic-receive-offload: on
large-receive-offload: off [fixed]
rx-vlan-offload: off [fixed]
tx-vlan-offload: off [fixed]
ntuple-filters: off [fixed]
receive-hashing: off [fixed]
highdma: on
rx-vlan-filter: off [fixed]
vlan-challenged: off [fixed]
tx-lockless: on [fixed]
netns-local: off [fixed]
tx-gso-robust: off [fixed]
tx-fcoe-segmentation: off [requested on]
tx-gre-segmentation: off [fixed]
tx-gre-csum-segmentation: off [fixed]
tx-ipxip4-segmentation: off [fixed]
tx-ipxip6-segmentation: off [fixed]
tx-udp_tnl-segmentation: off [fixed]
tx-udp_tnl-csum-segmentation: off [fixed]
tx-gso-partial: off [fixed]
tx-sctp-segmentation: on
tx-esp-segmentation: off [fixed]
tx-udp-segmentation: off [fixed]
fcoe-mtu: off [requested on]
tx-nocache-copy: off
loopback: off [fixed]
rx-fcs: off [fixed]
rx-all: off [fixed]
tx-vlan-stag-hw-insert: off [fixed]
rx-vlan-stag-hw-parse: off [fixed]
rx-vlan-stag-filter: off [fixed]
l2-fwd-offload: off [fixed]
hw-tc-offload: off [fixed]
esp-hw-offload: off [fixed]
esp-tx-csum-hw-offload: off [fixed]
rx-udp_tunnel-port-offload: off [fixed]
tls-hw-tx-offload: off [fixed]
tls-hw-rx-offload: off [fixed]
rx-gro-hw: off [fixed]
tls-hw-record: off [fixed]

ethtool output br0 (br101 is the same)

# ethtool -k br0
Features for br0:
rx-checksumming: off [fixed]
tx-checksumming: off
        tx-checksum-ipv4: off [fixed]
        tx-checksum-ip-generic: off
        tx-checksum-ipv6: off [fixed]
        tx-checksum-fcoe-crc: off [fixed]
        tx-checksum-sctp: off [fixed]
scatter-gather: off
        tx-scatter-gather: off
        tx-scatter-gather-fraglist: off [requested on]
tcp-segmentation-offload: off
        tx-tcp-segmentation: off [requested on]
        tx-tcp-ecn-segmentation: off [requested on]
        tx-tcp-mangleid-segmentation: off [requested on]
        tx-tcp6-segmentation: off [requested on]
udp-fragmentation-offload: off
generic-segmentation-offload: off
generic-receive-offload: off
large-receive-offload: off [fixed]
rx-vlan-offload: off [fixed]
tx-vlan-offload: on
ntuple-filters: off [fixed]
receive-hashing: off [fixed]
highdma: on
rx-vlan-filter: off [fixed]
vlan-challenged: off [fixed]
tx-lockless: on [fixed]
netns-local: on [fixed]
tx-gso-robust: off [requested on]
tx-fcoe-segmentation: off [requested on]
tx-gre-segmentation: on
tx-gre-csum-segmentation: on
tx-ipxip4-segmentation: on
tx-ipxip6-segmentation: on
tx-udp_tnl-segmentation: on
tx-udp_tnl-csum-segmentation: on
tx-gso-partial: on
tx-sctp-segmentation: on
tx-esp-segmentation: on
tx-udp-segmentation: on
fcoe-mtu: off [fixed]
tx-nocache-copy: off
loopback: off [fixed]
rx-fcs: off [fixed]
rx-all: off [fixed]
tx-vlan-stag-hw-insert: on
rx-vlan-stag-hw-parse: off [fixed]
rx-vlan-stag-filter: off [fixed]
l2-fwd-offload: off [fixed]
hw-tc-offload: off [fixed]
esp-hw-offload: off [fixed]
esp-tx-csum-hw-offload: off [fixed]
rx-udp_tunnel-port-offload: off [fixed]
tls-hw-tx-offload: off [fixed]
tls-hw-rx-offload: off [fixed]
rx-gro-hw: off [fixed]
tls-hw-record: off [fixed]
frankvandenhurk commented 2 years ago

Are you sure that large DNS packets are causing this? I haven't found a large DNS packet in my trials, most large packets are HTTPS.

cschutijser commented 2 years ago

Thanks for the output. The physical interface (eth0) indeed show tcp-segmentation-offload: on. So that's very likely the reason we see packets that are bigger than the configured MTU. Good to know.

Are you sure that large DNS packets are causing this? I haven't found a large DNS packet in my trials, most large packets are HTTPS.

The point I was trying to make (and I clearly didn't succeed :)) was that indeed the DNS packets are not the cause of this. I was trying to explain why we need to analyse more than just the protocol headers of packets. And DNS is the reason for that. For most protocols (like IP, TCP and UDP), we just need information gathered from the protocol headers and we don't need to look at the application layer data. So in those cases, we only need to have a look at the first few bytes of each packet (let's say +/- 100 bytes).

This is different for DNS. Since we want to know the hostnames of the hosts that we show in the SPIN interface, we need to capture DNS packets and inspect the entire contents of those packets (i.e., more than the +/- 100 bytes that I mentioned earlier).

Does that makes sense?

I've decided to lower the default capture length to 1232 bytes (commit d2616cce97dda489573a5bd53e566d34dac59a13). That way, we should capture most DNS packets. Not all of them, but that was the case before already as well. In other words, spin-pcap-reader experiences a lot of truncated packets but that's more or less expected. Since truncated packets are not much reason for concern, I've also "silenced" spin-pcap-reader by hiding most messages behind a new -v flag (commit 9eba84cc6543cbb74c739144666081a172b563c2). So you won't see those messages about truncated packets anymore (unless you use -v).

Does that fix the reason you opened this issue originally? Looking at the title of this issue, you are mentioning a crash. I assume that's still happening? I've looked at that a bit more. On my system, spin-pcap-reader often exits with no error message when the server (so spind) was down. Did you happen to restart spind by hand during that time (I assume that's not the case)? Still need to think a bit more about this.