antmicro / rowhammer-tester

https://antmicro.github.io/rowhammer-tester/
Apache License 2.0
52 stars 16 forks source link

ZCU 104 litex_server connection issue #160

Closed tanvirarafin closed 1 year ago

tanvirarafin commented 1 year ago

Hi,

I am having some issues with the litex_server connection on the ZCU 104 board. I am using the latest prebuilt image posted here and I was having the same connection issues discussed in #74.

As per previous discussions on issue #74, I have recompiled the Etherbone server with gcc-aarch64-linux-gnu and the debug option on. I can see that now the make srv command works and sends and receive some data. However, as I try the leds.py test, it fails.

The output during the test looks something like below:

Before running the test (only the server running), host:

make srv
litex_server --udp --udp-ip 192.168.100.50 --udp-port 1234
[CommUDP] ip: 192.168.100.50 / port: 1234 / tcp port: 1234

zcu104:

Received 12 byte packet
4e 6f 11 44 00 00 00 00 
00 00 00 00 
Sending 8 byte response
4e 6f 16 44 00 00 00 00

After running the test:

host (terminal 1):

python leds.py 
Using generated target files in: ../../build/zcu104
Didn't get a reponse from the board. Check connection?

host (terminal 2)

litex_server --udp --udp-ip 192.168.100.50 --udp-port 1234
[CommUDP] ip: 192.168.100.50 / port: 1234 / tcp port: 1234
Connected with 127.0.0.1:58214
Disconnect
Exception in thread Thread-1:
Traceback (most recent call last):
  File "/usr/lib/python3.8/threading.py", line 932, in _bootstrap_inner
    self.run()
  File "/usr/lib/python3.8/threading.py", line 870, in run
    self._target(*self._args, **self._kwargs)
  File "/home/abc/rowhammer-tester/third_party/litex/litex/tools/litex_server.py", line 153, in _serve_thread
    reads += self.comm.read(addr, length, burst)
  File "/home/abc/rowhammer-tester/third_party/litex/litex/tools/remote/comm_udp.py", line 127, in read
    raise socket.timeout
socket.timeout

ZCU104:

Received 12 byte packet
4e 6f 11 44 00 00 00 00 
00 00 00 00 
Sending 8 byte response
4e 6f 16 44 00 00 00 00 
Received 12 byte packet
4e 6f 11 44 00 00 00 00 
00 00 00 00 
Sending 8 byte response
4e 6f 16 44 00 00 00 00 
Received 20 byte packet
4e 6f 10 44 00 00 00 00 
00 0f 00 01 00 00 00 01 
f0 00 58 00 
0xf0005800 => 0x00000052
Sending 20 byte response
4e 6f 14 44 00 00 00 00 
00 0f 01 00 01 00 00 00 
00 00 00 52 
Received 20 byte packet
4e 6f 10 44 00 00 00 00 
00 0f 00 01 00 00 00 02 
f0 00 58 00 
0xf0005800 => 0x00000052
Sending 20 byte response
4e 6f 14 44 00 00 00 00 
00 0f 01 00 02 00 00 00 
00 00 00 52 
Received 20 byte packet
4e 6f 10 44 00 00 00 00 
00 0f 00 01 00 00 00 03 
f0 00 58 00 
0xf0005800 => 0x00000052
Sending 20 byte response
4e 6f 14 44 00 00 00 00 
00 0f 01 00 03 00 00 00 
00 00 00 52 
Received 20 byte packet
4e 6f 10 44 00 00 00 00 
00 0f 00 01 00 00 00 04 
f0 00 58 00 

....

Any suggestion why this might happen? Is there any updated ZCU 104 image that have the fixed Etherbone server? Thanks 🙏

michalsieron commented 1 year ago

Hi @tanvirarafin

Turns out, endianness of the base_ret_addr was being swapped, when it shouldn't. That is, we didn't convert it from network to host, but we still converted it from host to network.

This caused the issue with changes from https://github.com/enjoy-digital/litex/pull/1504. As reply ID didn't match, the packets were being dropped and no response were being received.

We will release a new prebuilt image for the ZCU104 soon. Until then, you can rebuild the server with the following patch https://github.com/antmicro/rowhammer-tester/pull/166/commits/f061c80100130aeb828b7dd8b7b924abfe5ff50b and replace the one on the SD card under /bin/zcu104_etherbone.

If you want to replace it via the SSH, you may need to first killall zcu104_etherbone to stop the server currently running, then do the scp and then reboot the ZCU104.

As there was a bug introduced during recent cleanup with the US(+) FPGAs, you may need to wait for https://github.com/antmicro/rowhammer-tester/pull/166 to be merged if you want to build and use a new bitstream.

tanvirarafin commented 1 year ago

It's working now :).