genodelabs / genode

Genode OS Framework
https://genode.org/
Other
1.07k stars 254 forks source link

nic_router: bad checksum on udp forward #4636

Closed m-stein closed 5 months ago

m-stein commented 2 years ago

@skalk reported that he cannot fetch a file via TFTP within uboot on our iMX8 test board if the file is provided by his Sculpt 22.10 Debian 11 VM. However, the fetch works fine if the server is on a non-Sculpt machine or a machine with an older Sculpt version. We found that the UDP checksum of the initial TFTP request is broken when it arrives in the VM on Sculpt 22.10 while on Sculpt 22.04 the checksum is fine.

This leads to the conclusion that https://github.com/genodelabs/genode/commit/9a37ccfe291508c5199e10813117b16a5766ff44, although correct for most cases, has a corner case that is broken and that is triggered in this exact TFTP scenario.

m-stein commented 2 years ago

@chelmuth I've tried to test this one as good as possible and would recommend it for the release: 28024b7203 Revert "nic_router: incremental L4 checksum updates"

chelmuth commented 2 years ago

@m-stein To answer our open question from yesterday: Linux TCP/IP statistics can be accessed by the follwing command lines.

netstat --statistics
nstat -sz
nstat -sz |& grep -i err
chelmuth commented 6 months ago

@m-stein do you see any chance to address this issue after the Sculpt 24.04 release (or at least update with your current findings)?

m-stein commented 6 months ago

My plan is to finish the work at the File Vault next week. Should I prioritize this issue next?

chelmuth commented 6 months ago

Let's discuss the list of open issues offline for prioritization.

If you tackle this issue I'd appreciate a unit/stand-alone test of the implementation, which should be not much effort. Wireshark/pcap may be handy.

m-stein commented 6 months ago

I played around with tshark, tcpdump, trafgen, randpkt and the trace_recorder_pcapng test this morning. I managed to write a short test that detects bad checksums captured in a nic_router scenario and learned in principle how to generate random but valid traffic. Now I'm a bit uncertain in which direction to proceed. One way would be to set up a real-world scenario (with nic_router) that generates diverse but reproducible traffic that we capture. Alternatively, we could generate random, valid packets with lx tooling and write a test component that updates the packet checksums both completely and incrementally with our algorithms (no nic_router). I assume that reading pcap into session packets shouldn't be too complicated. Any opinions?

chelmuth commented 6 months ago

In my opinion, a unit test of the algorithms suffices as this was missing in the past and let to a broken component. The issue was unnoticed for a while and only discovered in @skalk's TFTP redirection scenario. My intention is to avoid repeating this sloppiness.

m-stein commented 6 months ago

@chelmuth Commit d96632ab29 implements the unit test.

chelmuth commented 5 months ago

@m-stein the run script reproducibly fails for SEED=2812065779 and SEED=991351508 like follows (current staging).

Error: number of checksums in bin/internet_checksum_dir/output.pcap (20001) differs from number of checked checksums (20000)
chelmuth commented 5 months ago

bin/internet_checksum_dir/tmp contains the following line (which is missing with other seeds).

[Checksum Status: Unverified]
chelmuth commented 5 months ago

The culprit seems to be the following packet in output.pcap.

Stream Control Transmission Protocol, Src Port: 22000 (22000), Dst Port: 38895 (38895)
    Source port: 22000
    Destination port: 38895
    Verification tag: 0x37b485bf
    Checksum: 0x838f40f9 [unverified]
    [Checksum Status: Unverified]

SCTP?

chelmuth commented 5 months ago

tshark -o ip.check_checksum:TRUE -o tcp.check_checksum:TRUE -o udp.check_checksum:TRUE -O ip,udp,tcp,icmp -r bin/input.pcap skips all non-listed protocols and works for the seeds above. @m-stein does this reflect your intentions?

m-stein commented 5 months ago

@chelmuth This looks good to me! Thanks a lot! I also tried to better tailor the output but could not find a way.

nfeske commented 5 months ago

What is the status of this issue?

chelmuth commented 5 months ago

The internet_checksum test attests that the current state works as intended. Currently, there are no concrete plans to revive incremental checksum update.

m-stein commented 5 months ago

So, may I close it?