enjoy-digital / colorlite

Take control of your Colorlight FPGA board with LiteX/LiteEth :)
BSD 2-Clause "Simplified" License
94 stars 13 forks source link

Very unreliable packet handling when porting to a revision 8.0. #8

Open RGD2 opened 1 year ago

RGD2 commented 1 year ago

Note: might be related to #7

After cd build/gateware; openFPGALoader colorlight.bit (I have a ft2232H hooked up to the JTAG header as appropriate for a V8.0 board, this is just how I can configure the FPGA).

The Led on the board blinks at about 1Hz, and in this state litex_server --udp --udp-ip=192.168.1.20 appears to connect ok, but this appears EXTREMELY unhealthy:

ping 192.168.1.20
PING 192.168.1.20 (192.168.1.20) 56(84) bytes of data.
64 bytes from 192.168.1.20: icmp_seq=1 ttl=128 time=0.447 ms
64 bytes from 192.168.1.20: icmp_seq=2 ttl=128 time=0.280 ms
64 bytes from 192.168.1.20: icmp_seq=6 ttl=128 time=0.416 ms
64 bytes from 192.168.1.20: icmp_seq=18 ttl=128 time=0.372 ms
64 bytes from 192.168.1.20: icmp_seq=19 ttl=128 time=674 ms
64 bytes from 192.168.1.20: icmp_seq=20 ttl=128 time=0.130 ms
64 bytes from 192.168.1.20: icmp_seq=21 ttl=128 time=0.206 ms
64 bytes from 192.168.1.20: icmp_seq=22 ttl=128 time=0.185 ms
64 bytes from 192.168.1.20: icmp_seq=23 ttl=128 time=0.230 ms
64 bytes from 192.168.1.20: icmp_seq=24 ttl=128 time=0.229 ms
64 bytes from 192.168.1.20: icmp_seq=25 ttl=128 time=0.273 ms
64 bytes from 192.168.1.20: icmp_seq=26 ttl=128 time=0.187 ms
64 bytes from 192.168.1.20: icmp_seq=27 ttl=128 time=0.229 ms
64 bytes from 192.168.1.20: icmp_seq=28 ttl=128 time=0.205 ms
64 bytes from 192.168.1.20: icmp_seq=28 ttl=128 time=590 ms (DUP!)
64 bytes from 192.168.1.20: icmp_seq=29 ttl=128 time=0.201 ms
64 bytes from 192.168.1.20: icmp_seq=30 ttl=128 time=0.235 ms
64 bytes from 192.168.1.20: icmp_seq=50 ttl=128 time=0.295 ms
64 bytes from 192.168.1.20: icmp_seq=76 ttl=128 time=0.324 ms
64 bytes from 192.168.1.20: icmp_seq=77 ttl=128 time=0.248 ms
64 bytes from 192.168.1.20: icmp_seq=78 ttl=128 time=0.350 ms
64 bytes from 192.168.1.20: icmp_seq=79 ttl=128 time=0.262 ms
64 bytes from 192.168.1.20: icmp_seq=80 ttl=128 time=274 ms
64 bytes from 192.168.1.20: icmp_seq=81 ttl=128 time=0.142 ms
64 bytes from 192.168.1.20: icmp_seq=82 ttl=128 time=0.244 ms
64 bytes from 192.168.1.20: icmp_seq=83 ttl=128 time=0.298 ms
64 bytes from 192.168.1.20: icmp_seq=84 ttl=128 time=0.165 ms
64 bytes from 192.168.1.20: icmp_seq=85 ttl=128 time=0.228 ms
64 bytes from 192.168.1.20: icmp_seq=86 ttl=128 time=0.214 ms
64 bytes from 192.168.1.20: icmp_seq=87 ttl=128 time=0.225 ms
64 bytes from 192.168.1.20: icmp_seq=88 ttl=128 time=0.265 ms
64 bytes from 192.168.1.20: icmp_seq=89 ttl=128 time=0.180 ms
64 bytes from 192.168.1.20: icmp_seq=90 ttl=128 time=0.282 ms
64 bytes from 192.168.1.20: icmp_seq=91 ttl=128 time=0.269 ms
64 bytes from 192.168.1.20: icmp_seq=93 ttl=128 time=0.243 ms
64 bytes from 192.168.1.20: icmp_seq=101 ttl=128 time=0.350 ms
64 bytes from 192.168.1.20: icmp_seq=135 ttl=128 time=0.357 ms
64 bytes from 192.168.1.20: icmp_seq=177 ttl=128 time=0.228 ms
64 bytes from 192.168.1.20: icmp_seq=193 ttl=128 time=0.348 ms
64 bytes from 192.168.1.20: icmp_seq=194 ttl=128 time=0.238 ms
64 bytes from 192.168.1.20: icmp_seq=195 ttl=128 time=0.234 ms
64 bytes from 192.168.1.20: icmp_seq=196 ttl=128 time=0.257 ms
64 bytes from 192.168.1.20: icmp_seq=197 ttl=128 time=498 ms
64 bytes from 192.168.1.20: icmp_seq=198 ttl=128 time=0.181 ms
64 bytes from 192.168.1.20: icmp_seq=199 ttl=128 time=0.210 ms
64 bytes from 192.168.1.20: icmp_seq=200 ttl=128 time=0.166 ms
64 bytes from 192.168.1.20: icmp_seq=201 ttl=128 time=0.194 ms
64 bytes from 192.168.1.20: icmp_seq=202 ttl=128 time=0.215 ms
64 bytes from 192.168.1.20: icmp_seq=203 ttl=128 time=0.241 ms
64 bytes from 192.168.1.20: icmp_seq=203 ttl=128 time=486 ms (DUP!)
64 bytes from 192.168.1.20: icmp_seq=204 ttl=128 time=0.179 ms
64 bytes from 192.168.1.20: icmp_seq=205 ttl=128 time=0.170 ms
64 bytes from 192.168.1.20: icmp_seq=206 ttl=128 time=0.208 ms
64 bytes from 192.168.1.20: icmp_seq=207 ttl=128 time=0.226 ms
64 bytes from 192.168.1.20: icmp_seq=208 ttl=128 time=0.262 ms
64 bytes from 192.168.1.20: icmp_seq=209 ttl=128 time=0.259 ms
64 bytes from 192.168.1.20: icmp_seq=210 ttl=128 time=0.197 ms
64 bytes from 192.168.1.20: icmp_seq=211 ttl=128 time=0.217 ms
64 bytes from 192.168.1.20: icmp_seq=212 ttl=128 time=0.230 ms
64 bytes from 192.168.1.20: icmp_seq=213 ttl=128 time=0.254 ms
64 bytes from 192.168.1.20: icmp_seq=214 ttl=128 time=0.230 ms
64 bytes from 192.168.1.20: icmp_seq=215 ttl=128 time=130 ms
64 bytes from 192.168.1.20: icmp_seq=216 ttl=128 time=0.189 ms
64 bytes from 192.168.1.20: icmp_seq=217 ttl=128 time=0.231 ms
64 bytes from 192.168.1.20: icmp_seq=218 ttl=128 time=0.258 ms
64 bytes from 192.168.1.20: icmp_seq=219 ttl=128 time=0.162 ms
64 bytes from 192.168.1.20: icmp_seq=220 ttl=128 time=0.189 ms
64 bytes from 192.168.1.20: icmp_seq=221 ttl=128 time=0.209 ms
64 bytes from 192.168.1.20: icmp_seq=222 ttl=128 time=0.238 ms
64 bytes from 192.168.1.20: icmp_seq=223 ttl=128 time=0.269 ms
64 bytes from 192.168.1.20: icmp_seq=224 ttl=128 time=0.165 ms
^C
--- 192.168.1.20 ping statistics ---
224 packets transmitted, 69 received, +2 duplicates, 69.1964% packet loss, time 228154ms
rtt min/avg/max/mdev = 0.130/37.564/673.749/134.149 ms

Shouldn't this core have no trouble keeping this up without packet loss at least? It's not a high bar.

This is without litex_server running, just pinging the board, which is at the other end of a network cable running to a dedicated GbE port (no switches etc) configured with a static IP.

Does it look like this for anyone else?

enjoy-digital commented 1 year ago

As you noted, this project is supporting the 7.0 which is, as you guessed, not exhibiting this behavior:

ping 192.168.1.20
PING 192.168.1.20 (192.168.1.20) 56(84) bytes of data.
64 bytes from 192.168.1.20: icmp_seq=1 ttl=128 time=0.066 ms
64 bytes from 192.168.1.20: icmp_seq=2 ttl=128 time=0.039 ms
64 bytes from 192.168.1.20: icmp_seq=3 ttl=128 time=0.059 ms
64 bytes from 192.168.1.20: icmp_seq=4 ttl=128 time=0.037 ms
64 bytes from 192.168.1.20: icmp_seq=5 ttl=128 time=0.055 ms
64 bytes from 192.168.1.20: icmp_seq=6 ttl=128 time=0.054 ms
[...]
64 bytes from 192.168.1.20: icmp_seq=48 ttl=128 time=0.055 ms
64 bytes from 192.168.1.20: icmp_seq=49 ttl=128 time=0.042 ms
64 bytes from 192.168.1.20: icmp_seq=50 ttl=128 time=0.053 ms
64 bytes from 192.168.1.20: icmp_seq=51 ttl=128 time=0.090 ms
64 bytes from 192.168.1.20: icmp_seq=52 ttl=128 time=0.036 ms
64 bytes from 192.168.1.20: icmp_seq=53 ttl=128 time=0.053 ms
64 bytes from 192.168.1.20: icmp_seq=54 ttl=128 time=0.032 ms
^C
--- 192.168.1.20 ping statistics ---
54 packets transmitted, 54 received, 0% packet loss, time 54249ms
rtt min/avg/max/mdev = 0.032/0.052/0.090/0.012 ms

Given this, I would then appreciate a bit less of arrogance:

this appears EXTREMELY unhealthy:

Shouldn't this core have no trouble keeping this up without packet loss at least? It's not a high bar.

I'm not sure how adding this is useful and/or will encourage developers to help? So be careful please.

I don't have the 8.0 revision, but the two first things I would do would be to check the tx_delayand try to adjust the value and also check the reported timings and try to improve them by eventually disabling some features.

As you can imagine, the aim of this project is to allow developers to use Ethernet on FPGA for very cheap, which of course introduce some constraints on the core that hasn't been designed for this initially (initially targeting 7-series FPGA). With more time/help, it would probably be possible to improve portability between hardware, but the current solution limits the scope to one board that has been tested due to these constraints.

RGD2 commented 1 year ago

Interesting note, doesn't appear to drop/duplicate packets when ping flooded:

 $ sudo ping -f -c 50000 192.168.1.20
PING 192.168.1.20 (192.168.1.20) 56(84) bytes of data.

--- 192.168.1.20 ping statistics ---
50000 packets transmitted, 50000 received, 0% packet loss, time 7760ms
rtt min/avg/max/mdev = 0.041/0.114/0.272/0.036 ms, ipg/ewma 0.155/0.102 ms

Same behaviour still observed with a 'normal' ~1Hz ping:


$ ping 192.168.1.20
PING 192.168.1.20 (192.168.1.20) 56(84) bytes of data.
64 bytes from 192.168.1.20: icmp_seq=1 ttl=128 time=0.303 ms
64 bytes from 192.168.1.20: icmp_seq=2 ttl=128 time=0.326 ms
^C
--- 192.168.1.20 ping statistics ---
7 packets transmitted, 2 received, 71.4286% packet loss, time 6143ms
rtt min/avg/max/mdev = 0.303/0.314/0.326/0.011 ms
Hoernchen commented 1 year ago

I can confirm this weird issue with my rev 8.2:

 ~ ping 192.168.2.20
PING 192.168.2.20 (192.168.2.20) 56(84) bytes of data.
64 bytes from 192.168.2.20: icmp_seq=86 ttl=128 time=0.151 ms
64 bytes from 192.168.2.20: icmp_seq=118 ttl=128 time=0.108 ms
64 bytes from 192.168.2.20: icmp_seq=119 ttl=128 time=0.218 ms
64 bytes from 192.168.2.20: icmp_seq=125 ttl=128 time=0.113 ms
64 bytes from 192.168.2.20: icmp_seq=126 ttl=128 time=0.179 ms
64 bytes from 192.168.2.20: icmp_seq=141 ttl=128 time=66.4 ms
64 bytes from 192.168.2.20: icmp_seq=142 ttl=128 time=0.136 ms
^C
--- 192.168.2.20 ping statistics ---
171 packets transmitted, 7 received, 95.9064% packet loss, time 174051ms
rtt min/avg/max/mdev = 0.108/9.617/66.420/23.189 ms

but:

sudo ping -f 192.168.2.20
PING 192.168.2.20 (192.168.2.20) 56(84) bytes of data.
.............................................................................................................................................................................................................................................................
.............................................................................................................................................................................................................................................................
............................................................................................................................................................^C
--- 192.168.2.20 ping statistics ---
49370 packets transmitted, 48708 received, 1.3409% packet loss, time 18906ms
rtt min/avg/max/mdev = 0.056/0.082/3.488/0.021 ms, ipg/ewma 0.382/0.082 ms