Closed sqazi closed 3 years ago
I'll take a look on that and try to reproduce the issue. If I understand correctly you are e.g. running the rowhammer.py
/hw_rowhammer.py
in a loop with different parameters? Have you observed any regularity in that? For example similar loop count? Is it always hanging on the same phase of the *rowhammer.py
scripts?
I did some tests with loops of 50 and 100 runs and could not reproduce the issue. What numbers of tests did you have to run to observe the problem?
Ok, so I tried a simpler experiment. I created a patch that loops infinitely on a single command:
def run(self, row_pairs, pattern_generator, read_count, row_progress=16, verify_initial=True):
def __run(self, row_pairs, pattern_generator, read_count, row_progress=16, verify_initial=True): divisor, mask = 0, 0 if self.data_inversion: divisor, mask = self.data_inversion @@ -135,6 +135,13 @@ class HwRowHammer(RowHammer): self.display_errors(errors) return
def run(self, row_pairs, pattern_generator, read_count, row_progress=16, verify_initial=True):
k=0
while True:
print('k={}'.format(k))
self.__run(row_pairs, pattern_generator, read_count, row_progress, verify_initial)
k+=1
I ran:
python hw_rowhammer.py --nrows 10 --read_count 475136 --pattern 01_per_row --row-pairs const --const-rows-pair 0 2 --payload-executor --data-inversion 2 0b10 --no-refresh
And it died at iteration 218:
k=218
Preparing ... WARNING: only single word patterns supported, using: 0xffffffff
Filling memory with data ... Progress: [========================================] 16777216 / 16777216
Verifying written memory ... Progress: [========================================] 16777216 / 16777216 (Errors: 0) OK
Disabling refresh ...
Running row hammer attacks ... Generating payload: tRAS = 5 tRP = 3 tREFI = 977 tRFC = 45 Repeatable unit: 116 Repetitions: 58 Payload size = 1.84KB / 1024.00KB Payload per-row toggle count = 475.136K x2 rows Payload refreshes (if enabled) = 8190 Expected execution time = 7970879 cycles = 63.767 ms
Transferring the payload ...
Executing ... Time taken: 67.581 ms
Reenabling refresh ...
Verifying attacked memory ...
Traceback (most recent call last): ] 3150 / 16777216 (Errors: 203)
File "hw_rowhammer.py", line 148, in
Server doesn't show a disconnect:
$ make srv litex_server --udp --udp-ip 192.168.1.10 --udp-port 1234 LiteX remote server [CommUDP] ip: 192.168.1.10 / port: 1234 / tcp port: 1234 Connected with 127.0.0.1:40628 Disconnect Connected with 127.0.0.1:40632 Disconnect Connected with 127.0.0.1:43306 Disconnect Connected with 127.0.0.1:43308
I've replicated the test you described and it ran for 6385 iterations and it's still running without hanging, so I cannot really reproduce the issue this way. Maybe this is specific to your network configuration? The fact that restarting the litex_server
fixes the problem for you means that there is no problem in the ZCU104 firmware nor in the FPGA.
For reference I was using thrid_party/litex at 42d62991 and litex-rowhammer-tester at 3a72ceb, so there should be no changes that could affect this. I built the bistream with python rowhammer_tester/targets/zcu104.py --payload-size 0x100000 --build
.
You are right. It was my network setup. I was using wifi on the host side, which is not such a good idea with UDP. If I use the wired network, it appears to work fine.
When doing long running experiments that use python scripts from a bash loop (i.e. frequently creating/destroying connections), the server stops working after some number of iterations. Everything goes back to normal if the server is restarted.