antmicro / rowhammer-tester

https://antmicro.github.io/rowhammer-tester/
Apache License 2.0
55 stars 16 forks source link

Frequency: Memory Controller vs. Payload Executor #191

Open biecho opened 6 days ago

biecho commented 6 days ago

When working with DDR5, I noticed a discrepancy in refresh behavior. Enabling refresh and waiting for 32 ms results in approximately 8192 REFs. However, using the payload executor to explicitly issue 8192 REFs completes in about 16 ms. Could the payload executor be running at a different frequency than the memory controller (MC)?

The payload code looks like this:

        refresh_loop = [
            encoder.I(
                OpCode.PRE,
                timeslice=timing.tRP,
                address=encoder.address(col=1 << 10, rank=0),
            ),  # precharge all
            encoder.I(OpCode.REF, timeslice=timing.tRFC),
            encoder.I(OpCode.NOOP, timeslice=(timing.tREFI - timing.tRFC - timing.tRP)),
            encoder.I(OpCode.LOOP, jump=3, count=refresh_count - 1),
        ]
        payload.extend(refresh_loop)

The output in verbose mode is:

  Payload size =  0.02KB / 32.00KB
 Expected execution time = 6382347 cycles = 31.912 ms
OpCode.NOOP     0x309
OpCode.PRE      0x4     0x10000
OpCode.REF      0x3c    0x0
OpCode.NOOP     0x2cb
OpCode.LOOP     0x1fff  0x3
OpCode.NOOP     0x0

Transferring the payload ...

Executing ...

Total elapsed time: 16.345 ms
biecho commented 5 days ago

Could the issue be related to the fact that some commands in DDR5 require 2 clock cycles, while others only require 1? Is the payload executor fully functional with DDR5? From the code in rowhammer_tester/gateware/payload_executor.py, it appears that much of the logic relies on signals like cas, ras, and we, which are specific to DDR4 where each command is issued within a single cycle.

mtdudek commented 14 hours ago

I've reproduced your issue and found that the encoder.I(OpCode.LOOP, jump=3, count=refresh_count - 1) was incorrectly encoded.

OpCode.LOOP has only 12 bits to store counter and the refresh_loop tried to store a 13 bit value. This caused a silent truncation, and instead of 8191 loops, it was only 4095.

I also checked adding a second loop body with 4095 iterations, so that there were 8192 refreshes in total, and execution time was around 32ms, as expected.

I'll add assertions to the payload execution encoding to remove this silent truncation issue.

And as for cas, ras, we signals, there is a translation layer in the DFII module (third_party/litedram/litedram/dfii.py). It converts pre DDR5 commands to the DDR5 standard and sends them to the DDR5 PHY.