enjoy-digital / litex

Build your hardware, easily!
Other
2.89k stars 555 forks source link

Ethernet connection stops if two packets are sent too close (ECP5 5A75B) #1268

Closed faeboli closed 2 years ago

faeboli commented 2 years ago

Hello, I'm having a problem with etherbone: the connection stops working if two packets are sent too close to each other. The only way to restore connection is to power cycle the board. The packets need to be very close (few tens of us apart) to happen, but the problem appear after some minutes in my raspi4 if I send 1 packet per ms to the board. I was able to reproduce the problem on my ubuntu 20.04 LTS host machine with the default board target file: Litex updated this evening. Board 5A75B v8.0 colorlight_5a_75x.py edited with increased buffer depth on line 166: self.add_etherbone(phy=self.ethphy, ip_address=eth_ip,buffer_depth=1060)

build command: ./colorlight_5a_75x.py --revision=8.0 --uart-name=crossover --eth-ip="192.168.2.50" --with-etherbone --csr-csv=csr.csv --build

For reproducing the problem I've used the python code attached here, the data is a packet that asks to read back contents of 45 contiguous addresses starting from 0x00000000. If I comment out the delay, the board stops working and need a power cycle. This situation can happen for example if the host machine sends arp messages near my UDP etherbone packets. Is there something I'm doing wrong? Or some workaround? If anybody can test please let me know if the problem can be reproduced. Thanks


#!/usr/bin/env python3

import time
import socket

s = socket.socket(family=socket.AF_INET, type=socket.SOCK_DGRAM)
data=b'No\x10D\x00\x00\x00\x00\x00\x0f\x00-\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x04\x00\x00\x00\x08\x00\x00\x00\x0c\x00\x00\x00\x10\x00\x00\x00\x14\x00\x00\x00\x18\x00\x00\x00\x1c\x00\x00\x00 \x00\x00\x00$\x00\x00\x00(\x00\x00\x00,\x00\x00\x000\x00\x00\x004\x00\x00\x008\x00\x00\x00<\x00\x00\x00@\x00\x00\x00D\x00\x00\x00H\x00\x00\x00L\x00\x00\x00P\x00\x00\x00T\x00\x00\x00X\x00\x00\x00\\\x00\x00\x00`\x00\x00\x00d\x00\x00\x00h\x00\x00\x00l\x00\x00\x00p\x00\x00\x00t\x00\x00\x00x\x00\x00\x00|\x00\x00\x00\x80\x00\x00\x00\x84\x00\x00\x00\x88\x00\x00\x00\x8c\x00\x00\x00\x90\x00\x00\x00\x94\x00\x00\x00\x98\x00\x00\x00\x9c\x00\x00\x00\xa0\x00\x00\x00\xa4\x00\x00\x00\xa8\x00\x00\x00\xac\x00\x00\x00\xb0'   
for i in range(1,10) :
    s.sendto(data,("192.168.2.50", 1234))
    print(data)
#   time.sleep(0.001)
enjoy-digital commented 2 years ago

Hi @faeboli,

buffer_depth=1060 is higher than what could be supported by Etherbone, I just added an assertion in LiteEth with https://github.com/enjoy-digital/liteeth/commit/bc9162d578bec79003c5ce9bbf44cb14c9a25a0d to avoid future miss-configuration. The colorlight already probably has trouble meeting timings and a such high buffer_depth value will not help.

Could you see if with the default value of 16 and with a maximum number of read of 4, 8, 16 the issue also happens? If so, could you try increasing the sys_clk_freq?

This could also be interesting to put a LiteScope instance in your design (over UART or JTAG) and see what happens internally:

Now that JTABBone is supported in ECP5, it's pretty easy/convenient to use it with just a self.add_jtagbone() to your SoC.

faeboli commented 2 years ago

Thank you for your answer! Update:

import time import socket

s = socket.socket(family=socket.AF_INET, type=socket.SOCK_DGRAM)

data=b'No\x10D\x00\x00\x00\x00\x00\x0f\x00\x01\x00\x00\x00\x00\x00\x00\x00\x00' #read 1 register

data=b'No\x10D\x00\x00\x00\x00\x00\x0f\x00\x02\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x04' #read 2 registers

data=b'No\x10D\x00\x00\x00\x00\x00\x0f\x00\x04\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x04\x00\x00\x00\x08\x00\x00\x00\x0c' #read 4 registers

for i in range(1,3) : s.sendto(data,("192.168.2.50", 1234)) print(data)

time.sleep(0.001)


Results as similar, i.e. if the delay is commented out, litex panics when 2 packets are sent less than about 50us apart.
The failure behavior changes with packet length:
- 4 registers readback fails locking out the board
- 2 and 1 register readback fails flooding the connection with loads of packets
faeboli commented 2 years ago

Now I have configured JTAG Bridge and I'd like to try to use LiteScope for understanding something more and try to help, but I'm a little lost about what to measure and where, do you have a suggestion of possible signals to acquire? Thanks

enjoy-digital commented 2 years ago

@faeboli: Sorry for the delay, I'll try to reproduce here and if so, should be able to fix directly. BTW I just found this: https://forum.linuxcnc.org/27-driver-boards/44422-colorcnc-colorlight-5a-75e-5a-75b-as-fpga-controller-board Is it related to this project?

faeboli commented 2 years ago

@enjoy-digital Thank you for your support, hope you can reproduce the problem. To answer your question, yes it's related to that work, I've posted in that forum as "muvideo"

enjoy-digital commented 2 years ago

@faeboli: OK thanks. Funny thing is that I also played with Linux-CNC in the past and also thought about using LiteEth + FPGA for such purposes (but sadly don't have time for all projects...). So that's great seeing this project and also a motivation for me to provide more support now that things are more concrete and related to something I'm also interested in :)

enjoy-digital commented 2 years ago

@faeboli: The issue should be fixed with https://github.com/enjoy-digital/litex/commit/edd98c23cbd8b213636d938fe0a2435375f96a80 (I've been able to reproduce it and no longer see it with this fix). Can you do a test (and set buffer_depth to 255 which is the maximum supported value)?.

faeboli commented 2 years ago

@enjoy-digital Hi, wonderful, just made some tests and the problem seem solved in my testbench, I'll test the fix also on Linuxcnc to confirm that the connection remains stable also on a raspberry with preempt RT, where it was first noticed, this will take some time, but I'm positive that the problem will be solved. Thanks, Fabio Update: confirmed fix for my setup also in linuxcnc

enjoy-digital commented 2 years ago

Great, thanks for the feedback @faeboli. Please ask if any issue in the future since as I said I find this project very interesting and willing to help. I now also better understand the request with https://github.com/enjoy-digital/liteeth/issues/103, but still haven't been able to think about it. Florent

faeboli commented 2 years ago

Thank you for your support, there is a bunch of smart guys at linuxcnc forum that are actively working on litex for linuxcnc, in my opinion the availability of cheap fpga boards with gigabit ethernet together with linuxcnc on raspberry is a game changer for low cost cnc builds. The possibility to daisy chain several boards will help to build robust and dependable control hardware. Litex is enabling all this potential to come together.

Fabio.

enjoy-digital commented 2 years ago

That's great to see this, I'll have think about the best way to enable daisy-chain in LiteEth. And if features are missing or issues are found during your efforts to use it with linuxcnc, feel free to ask on github issues or join the #litex channel on libera.chat.

romanetz commented 2 years ago

Hi @faeboli, @enjoy-digital

buffer_depth=1060 is higher than what could be supported by Etherbone, I just added an assertion in LiteEth with enjoy-digital/liteeth@bc9162d to avoid future miss-configuration. The colorlight already probably has trouble meeting timings and a such high buffer_depth value will not help.

Could you see if with the default value of 16 and with a maximum number of read of 4, 8, 16 the issue also happens? If so, could you try increasing the sys_clk_freq?

This could also be interesting to put a LiteScope instance in your design (over UART or JTAG) and see what happens internally:

* https://github.com/enjoy-digital/litex/wiki/Use-Host-Bridge-to-control-debug-a-SoC

* https://github.com/enjoy-digital/litex/wiki/Use-LiteScope-To-Debug-A-SoC

Now that JTABBone is supported in ECP5, it's pretty easy/convenient to use it with just a self.add_jtagbone() to your SoC.

The intention of increasing buffer size up to 1060 was to increase available amount of wishbone registers exchanged in a single packet.

enjoy-digital commented 2 years ago

Hi @romanetz,

buffer_depth is expressed in wishbone words and a maximum of 255 is supported by the Etherbone protocol, so 1060 was not a valid value. A check has been added to LiteEth to prevent miss-configuration: https://github.com/enjoy-digital/liteeth/commit/bc9162d578bec79003c5ce9bbf44cb14c9a25a0d.