Xilinx / embeddedsw

Xilinx Embedded Software (embeddedsw) Development
Other
939 stars 1.07k forks source link

lwIP UDP fails to send message lengths of 4385 & 4386 #212

Open dnygren opened 2 years ago

dnygren commented 2 years ago

I’m using a very simple baremetal Zynq lwIP based UDP echo server https://github.com/dnygren/zynq_echo_servers_udp/tree/master/C%2B%2B/zynq_echo_server_udp_cpp/src to test lwIP's UDP functionality. It is built using Xilinx Vivado/SDK 2018.3, standalone 6.8, along with lwIP 2.11 v1.7 on a PicoZed board (I’ve also used earlier lwIP versions with the same result, and used a ZedBoard with the same result).

I see rather bizarre behavior in that messages of length 4385 and 4386 I send to the Zynq echo server are not received back. udp_sendto() appears to be getting called and completing successfully. Wireshark indicates there are "bogus, payload length" errors with these lengths.

Create files of length 4384, 4385, 4386, & 4387:

$ yes 0123456789 | head --bytes 4384 > 4384.txt
$ yes 0123456789 | head --bytes 4385 > 4385.txt
$ yes 0123456789 | head --bytes 4386 > 4386.txt
$ yes 0123456789 | head --bytes 4387 > 4387.txt

Verify lengths

$ ls -la
total 60
drwxr-xr-x  7 user user 4096 Jul  1 09:18 .
drwxr-xr-x 24 user user 4096 Jul  1 09:10 ..
-rw-r--r--  1 user user 4384 Jul  1 09:16 4384.txt
-rw-r--r--  1 user user 4385 Jul  1 09:14 4385.txt
-rw-r--r--  1 user user 4386 Jul  1 09:15 4386.txtI’ve tried the same experiment on a ZedBoard with the same result.  
-rw-r--r--  1 user user 4387 Jul  1 09:15 4387.txt

Point PC at Zynq board running echo server:

$ IP_ADDR=10.100.93.70
$ PORT=7

File length 4384 echos fine:File length 4384 echos fine:

$ cat 4384.txt |  nc -u -w 2 $IP_ADDR $PORT | tail -1
012345
File length 4384 echos fine: (No echo of file lengths 4385 & 4386)
$ cat 4385.txt |  nc -u -w 2 $IP_ADDR $PORT | tail -1
(No output)File length 4384 echos fine:
$ cat 4386.txt |  nc -u -w 2 $IP_ADDR $PORT | tail -1
(No output)
File length 4387 echos fine:I’ve tried the same experiment on a ZedBoard with the same result.
$ cat 4387.txt |  nc -u -w 2 $IP_ADDR $PORT | tail -1
012345678
Using wireshark with a filter of “src host 10.100.93.70” I get the following for the failing / no output 4386 case:
Frame 3: 1498 bytes on wir
If you have a Zynq running a UDP echo server, you can check if it fails with message lengths of 4385 & 4386?e (11984 bits), 1498 bytes captured (11984 bits) on interface enp1s0, id 0
Ethernet II, Src: Xilinx_00:01:02 (00:0a:35:00:01:02), Dst: Dell_90:91:89 (d8:9e:f3:90:91:89)
Internet Protocol Version 4, Src: 10.100.93.70, Dst: 10.100.92.51
User Datagram Protocol, Src Port: 7, Dst PorI’ve tried the same experiment on a ZedBoard with the same result.  t: 43035
    Source Port: 7
    Destination Port: 43035
    Length: 4394 (bogus, payload length 4392)
    [Checksum: [missing]]
    [Checksum Status: Not present]
    [Stream index: 0]
    [Timestamps]
    UDP payload (4384 bytes)
Echo
Also failing / no output 4385 case:
Frame 3: 1498 bytes on wire (11984 bits), 1498 bytes captured (11984 bits) on interface enp1s0, id 0
Ethernet II, Src: Xilinx_00:01:02 (00:0a:35:00:01:02), Dst: Dell_90:91:89 (d8:9e:f3:90:91:89)
Internet Protocol Version 4, Src: 10.100.93.70, Dst: 10.100.92.51
User Datagram Protocol, Src Port: 7, Dst Port: 53962
    Source Port: 7
    Destination Port: 53962
If you have a Zynq running a UDP echo server, you can check if it fails with message lengths of 4385 & 4386?
    Length: 4393 (bogus, payload length 4392)
    [Checksum: [missing]]
    [Checksum Status: Not present]
    [Stream index: 0]
    [Timestamps]
    UDP payload (4384 bytes)
Echo
For a 4384 passing case I get:
Frame 3: 1498 bytes on wire (11984 bits), 1498 bytes captured (11984 bits) on interface enp1s0, id 0
Ethernet II, Src: Xilinx_00:01:02 (00:0a:35:00:01:02), Dst: Dell_90:91:89 (d8:9e:f3:90:91:89)
Internet Protocol Version 4, Src: 10.100.93.70, Dst: 10.100.92.51
User Datagram Protocol, Src Port: 7, Dst Port: 60542
    Source Port: 7
    Destination Port: 60542
    Length: 4392
    [Checksum: [missing]]
    [Checksum Status: Not present]
    [Stream index: 0]
    [Timestamps]bogus, payload length 
    UDP payload (4384 bytes)
Echo
For a 4387 passing case **(note it is now frame 4)** I get :
Frame 4: 60 bytes on wire (480 bits), 60 bytes captured (480 bits) on interface enp1s0, id 0
Ethernet II, Src: Xilinxbogus, payload length _00:01:02 (00:0a:35:00:01:02), Dst: Dell_90:91:89 (d8:9e:f3:90:91:89)
Internet Protocol Version 4, Src: 10.100.93.70, Dst: 10.100.92.51
User Datagram Protocol, Src Port: 7, Dst Port: 39468
    Source Port: 7
    Destination Port: 39468
    Length: 4395
    [Checksum: [missing]]
    [Checksum Status: Not present]
    [Stream index: 0]
    [Timestamps]
    UDP payload (4387 bytes)
Echo
If you have a baremetal Zynq running an lwIP UDP echo server, can you check if it fails with message lengths of 4385 & 4386? Is there some sort of frame boundary bug in the UDP lwiP library? I haven't make extensive checks of which message lengths fail and which pass. Maybe there are more failing conditions.
dnygren commented 1 year ago

From a discussion on the lwip-users mailing list: https://lists.nongnu.org/archive/html/lwip-users/2023-02/msg00008.html

My best guess is a low level driver bug that happen when packets are sent in a short timeframe, which big UDP packets do due to IP fragmentation taking place. Sending small packets at a low rate does not fix the problem, it just hides it.

So someone knowledgeable about lwIP thinks this is a Xilinx driver problem.