antmicro / rowhammer-tester

https://antmicro.github.io/rowhammer-tester/
Apache License 2.0
56 stars 16 forks source link

Failing to induce memory errors with Row Hammer #3

Closed jedrzejboczar closed 4 years ago

jedrzejboczar commented 4 years ago

I prepared a basic setup for running a Row Hammer attack using EtherBone to fill/check the memory and a simple DMA reader module. I did several tests on the Arty A7 board but was not able to induce any errors in the memory.

Testing procedure

The attack procedure is as follows:

  1. DRAM initialization, leveling and a memory test (mem.py)
  2. For each row, fill the memory corresponding to that row (all columns) with a given pattern (repeated 32-bit word).
  3. Read all the written memory and verify that the data is correct.
  4. For each rows pair, perform a row hammer attack, i.e. set the addresses to address0=address(row=row0, col=512) and address1=address(row=row1, col=512) and run alternating DMA reads from these addresses.
  5. Read all the written memory once again and find if there were any errors induced.

The pattern, the number of rows, the number of row toggles per rows pair and the choice of row pairs were being changed in different test cases.

Test cases

The initial tests were being run on 1024 rows, attacked in pairs (2n, 2n+1), so (0,1), (2,3), (4,5), ... The pattern used for filling the rows was just 0x55555555, so alternating 1s and 0s in a single row. For each pair there were ~10M row toggles.

Later I did some tests with similar number of rows with the following modifications (added one by one):

Finally I ran longer tests on all 16384 rows of the DRAM chip and ~1.5M row toggles per rows pair (took slightly over 1h per run):

  1. Filling even/odd rows with all 0s/1s; attacking pairs of rows (0, n).
  2. Filling even/odd rows with all 0s/1s; attacking random pairs of rows (rand(), rand()).
  3. Filling each row with a single random 32-bit word, repeated for all the memory corresponding to that row. Attacking pairs (rand(), rand()).

For none of the tests I could identify any error.

I also verified that the reading procedure actually identifies errors by flipping a single bit manually.

Verification in simulation

To verify the correctness of our system I have run it with a simulation model of DRAM in Verilator. I was able to see correct command sequences in waveform dumps. Then I configured the model to log each DFI command that is being sent, printing time/bank/row/column/etc. I prepared a wrapper script that parses all the information and calculates the performance of our DMA reads.

In our setup we run at DDR3-800, with a refresh period of tREF=64ms and refresh command interval of tREFI=64ms/8192. The wrapper script counts the number of row toggles (i.e. of ACT commands) between two REF commands. From my tests we have a median of 93 row toggles between two subsequent REF commands. This gives 764k toggles per tREF, so before that particular row is being refreshed. Following, we have 11.9M toggles per second, which is in the range of frequencies that have been used in "4. Real System Demonstration" of the paper (11.6M, 11.7M, 12.3M and 6.1M).

Side notes

We are currently doing the tests on a single bank. The Row Hammer module could be extended to support attacking all banks at once, which should slightly increase the performance.

During the tests I was able to measure the speed at which we can currently write/read the DRAM memory over EtherBone. From my tests reading 1MB takes ~22 seconds, so reading the whole memory (256MB) should take ~1.5h. Writing is several times faster.

As for the time taken by the above tests on 16384 rows, we were using only 1 bank our of 8, so reading should take 0.1875h, which I measured and it did. Then each attack with ~1.5M toggles took ~0.16 seconds (1.5M/11.9M ~= 0.13s, so a bit of overhead there). Multiplying 0.16s*16833 = 0.728h. So adding everything together (with ~1min of writing) we have around 1h 10min.


Are there any obvious mistakes in the testing methodology used? Do we need better performance or different access patterns? Or is it possible that the DRAM chip we use (MT41K128M16) is not vulnerable?

yoongu commented 4 years ago

Thanks for the update! Sounds like you guys are making good progress.

I'd start with just a single aggressor: issuing ACT + PRE to the same row over and over again. It seems like you can toggle auto-precharge, so this can be done using ACT + READ (w/ auto-precharge). The intervening READ doesn't matter at all.

Can you guys also control the refresh interval, or even turn it off? I'd set it to a very large value (e.g., 64ms -> 1sec) and repeat the experiment. Here you'll get retention errors as well, but those should be randomly distributed whereas the rowhammer errors should be concentrated around the row you're activating.

It's hard to recommend a particular data pattern, but solid (all ones or all zeros) or striped (ones/zeros in even/odd, ones/zeros in odd/even) are good bets.

The first thousand rows on a single bank sounds like a good sample population. Just make sure that you read out the entire bank when checking for errors because we don't yet know which rows are adjacent to what.

jedrzejboczar commented 4 years ago

Yes, we can easily change the refresh interval or disable the refresh entirely. I'll run the tests with your suggestion and let you know when I have some results.

yoongu commented 4 years ago

Thanks Jedrzej. One question -- why is EtherBone so slow? 1MB/22s = 45KB/s. I was expecting a bandwidth that's two orders of magnitude larger. If reading out the module takes an hour, that's going to really slow down experiments. Will this get faster for future implementations and/or for higher-speed FPGA boards?

mithro commented 4 years ago

Hi @yoongu, I'm split the wishbone bridge question into a seperate GitHub issue #4 so we can keep this issue discussion to replicating Row Hammer stuff itself. Will answer your question there.

esshiu commented 4 years ago

Comments from Salman,

Salman Qazi, 9:04 AM Couple of things right off the bat. First, traditional row hammer, when there are no DRAM mitigations, requires attacking rows with a gap of 1. In other words, you don't hammer rows 0 and 1, you hammer rows 0 and 2. Secondly, for data pattern, the optimal pattern is to have the victim row (in this case row 1) be the inverse pattern of the aggressor rows (0 and 2).

Salman Qazi, 9:08 AM (this is for double-sided hammer, which is somewhat more effective than single sided hammer)

I read the later part of the bug, and I saw that you guys did figure out that alternating pattern for the rows. Awesome.

yoongu commented 4 years ago

Since we don't have row adjacency information, it'll be hard to stage a double-sided attack. There could be unintentional address swizzling that's happening on the FPGA board.

Disabling refreshes and pummeling a single row a million times should do the trick. In DDR3, it is known that several hundred thousand activations will flip a bit. Not for all rows, but for a significant portion of them.

jedrzejboczar commented 4 years ago

I performed some attacks with refresh disabled and was able to see the expected behavior. In the tests we were attacking only a single pair of rows 1000M times. The pattern was 010101... in each row.

Results for the pair (32, 34): test_512_32_34_1000M_alt01inrow

Results for the pair (99, 200) (plot zoomed in): test_1024_99_200_1000M_alt01inrow

I believe that these results show that the rows are placed sequentially (at least the tested ones) as errors happen in the adjacent row numbers.

jedrzejboczar commented 4 years ago

Since we don't have row adjacency information, it'll be hard to stage a double-sided attack. There could be unintentional address swizzling that's happening on the FPGA board.

Disabling refreshes and pummeling a single row a million times should do the trick. In DDR3, it is known that several hundred thousand activations will flip a bit. Not for all rows, but for a significant portion of them.

In the LiteDRAM controller there is no address/data swizzling, so that shouldn't be an issue. The only thing that the controller does is to map the address space bits as ROW_BANK_COL, but no additional changes are being done.

yoongu commented 4 years ago

Woot! This is great news, thanks Jedrzej.

Regarding swizzling, have you guys checked the Arty board traces too? Actually, if you could link/upload the board schematic that'd be great.

kgugala commented 4 years ago

@yoongu you can find the schematic here https://reference.digilentinc.com/_media/reference/programmable-logic/arty/arty_sch.pdf

kgugala commented 4 years ago

the lines don't seem to be swizzled

image

yoongu commented 4 years ago

Could you also double-check that the bus-numbers in the RTL match those from the FPGA pin-out? For example, the ADDR[0] should map to R2, and so on and so forth.

image

sqazi commented 4 years ago

What is the smallest number of hammers required to get an above-noise level of bit flips?

jedrzejboczar commented 4 years ago

What is the smallest number of hammers required to get an above-noise level of bit flips?

I just did a quick test with only 5M row toggles, on a single pair of rows (54, 133) and with refresh disabled only during the attack. This resulted in 3 errors for row 55, 1 error for rows (53, 132, 134) and no errors for all other rows (was testing only the first 512 rows).

jedrzejboczar commented 4 years ago

@yoongu I verified that the pins are mapped correctly. Just for reference, the RTL mappings for Arty are defined here and the pin names go from LSB to MSB.

I think we can now set up the environment on your side and try to reproduce the results. The current version on the master branch should work without issues. The setup and usage instructions are in the README. If anything is unclear just let me know here or open an issue.

If you have access to the Arty A7 board you can try to reproduce the results which I posted earlier. If not it would be good to try setting up the simulation, which will be helpful when testing Payload Execution. The simulator will print all the commands registered on PHY's DFI interface.

sqazi commented 4 years ago

Almost there:

(venv) sqazi@sqazi-glaptop:~/fpga/litex-rowhammer-tester/scripts$ python ./rowhammer.py --nrows 512 --read_count 10e6 --pattern 01_in_row --row-pairs const --const-rows-pair 54 133 --plot --no-refresh

Preparing ...

Filling memory with data ... ................................ Disabling refresh ...

Running row hammer attacks ... Iter 0 / 1 Rows = ( 54, 133), Count = 10.07M / 10.00M

Reenabling refresh ...

Verifying attacked memory ... ................................ row_errors for row= 53: 3 row_errors for row= 55: 3 row_errors for row= 132: 42 row_errors for row= 134: 28 Traceback (most recent call last): File "./rowhammer.py", line 290, in row_hammer.run(row_pairs=row_pairs, read_count=args.read_count, pattern_generator=pattern) File "./rowhammer.py", line 198, in run self.display_errors(errors) File "./rowhammer.py", line 151, in display_errors from matplotlib import pyplot as plt ModuleNotFoundError: No module named 'matplotlib'

I tried installing python3_matplotlib (outside the environment) but it didn't help. Any suggestions?

On Wed, Oct 28, 2020 at 5:20 AM Jędrzej Boczar notifications@github.com wrote:

@yoongu https://github.com/yoongu I verified that the pins are mapped correctly. Just for reference, the RTL mappings for Arty are defined here https://github.com/enjoy-digital/litex/blob/517a49253f8c0aff48924aa957acb8c0ab6e9766/litex/boards/platforms/arty.py#L101-L104 and the pin names go from LSB to MSB.

I think we can now set up the environment on your side and try to reproduce the results. The current version on the master branch should work without issues. The setup and usage instructions are in the README https://github.com/antmicro/litex-rowhammer-tester/#installing-dependencies. If anything is unclear just let me know here or open an issue.

If you have access to the Arty A7 board you can try to reproduce the results which I posted earlier. If not it would be good to try setting up the simulation, which will be helpful when testing Payload Execution. The simulator will print all the commands registered on PHY's DFI interface.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/antmicro/litex-rowhammer-tester/issues/3#issuecomment-717896120, or unsubscribe https://github.com/notifications/unsubscribe-auth/AABOQAK7YTDHJXIUZYNC2J3SNAD77ANCNFSM4SZX2M2Q .

sqazi commented 4 years ago

Nevermind, I got it. See attached figure.

$ python ./rowhammer.py --nrows 512 --read_count 10e6 --pattern 01_in_row --row-pairs const --const-rows-pair 54 133 --plot --no-refresh

Preparing ...

Filling memory with data ... ................................ Disabling refresh ...

Running row hammer attacks ... Iter 0 / 1 Rows = ( 54, 133), Count = 10.03M / 10.00M

Reenabling refresh ...

Verifying attacked memory ... ................................ row_errors for row= 53: 3 row_errors for row= 55: 3 row_errors for row= 132: 41 row_errors for row= 134: 28 Matplotlib is building the font cache; this may take a moment.

On Wed, Oct 28, 2020 at 10:54 AM Salman Qazi sqazi@google.com wrote:

Almost there:

(venv) sqazi@sqazi-glaptop:~/fpga/litex-rowhammer-tester/scripts$ python ./rowhammer.py --nrows 512 --read_count 10e6 --pattern 01_in_row --row-pairs const --const-rows-pair 54 133 --plot --no-refresh

Preparing ...

Filling memory with data ... ................................ Disabling refresh ...

Running row hammer attacks ... Iter 0 / 1 Rows = ( 54, 133), Count = 10.07M / 10.00M

Reenabling refresh ...

Verifying attacked memory ... ................................ row_errors for row= 53: 3 row_errors for row= 55: 3 row_errors for row= 132: 42 row_errors for row= 134: 28 Traceback (most recent call last): File "./rowhammer.py", line 290, in row_hammer.run(row_pairs=row_pairs, read_count=args.read_count, pattern_generator=pattern) File "./rowhammer.py", line 198, in run self.display_errors(errors) File "./rowhammer.py", line 151, in display_errors from matplotlib import pyplot as plt ModuleNotFoundError: No module named 'matplotlib'

I tried installing python3_matplotlib (outside the environment) but it didn't help. Any suggestions?

On Wed, Oct 28, 2020 at 5:20 AM Jędrzej Boczar notifications@github.com wrote:

@yoongu https://github.com/yoongu I verified that the pins are mapped correctly. Just for reference, the RTL mappings for Arty are defined here https://github.com/enjoy-digital/litex/blob/517a49253f8c0aff48924aa957acb8c0ab6e9766/litex/boards/platforms/arty.py#L101-L104 and the pin names go from LSB to MSB.

I think we can now set up the environment on your side and try to reproduce the results. The current version on the master branch should work without issues. The setup and usage instructions are in the README https://github.com/antmicro/litex-rowhammer-tester/#installing-dependencies. If anything is unclear just let me know here or open an issue.

If you have access to the Arty A7 board you can try to reproduce the results which I posted earlier. If not it would be good to try setting up the simulation, which will be helpful when testing Payload Execution. The simulator will print all the commands registered on PHY's DFI interface.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/antmicro/litex-rowhammer-tester/issues/3#issuecomment-717896120, or unsubscribe https://github.com/notifications/unsubscribe-auth/AABOQAK7YTDHJXIUZYNC2J3SNAD77ANCNFSM4SZX2M2Q .

sqazi commented 4 years ago

Regarding clarity, I would say that it wasn't clear to me that the environment had to be entered to run the scripts and I spend a bunch of time fumbling there.

On Wed, Oct 28, 2020 at 11:10 AM Salman Qazi sqazi@google.com wrote:

Nevermind, I got it. See attached figure.

$ python ./rowhammer.py --nrows 512 --read_count 10e6 --pattern 01_in_row --row-pairs const --const-rows-pair 54 133 --plot --no-refresh

Preparing ...

Filling memory with data ... ................................ Disabling refresh ...

Running row hammer attacks ... Iter 0 / 1 Rows = ( 54, 133), Count = 10.03M / 10.00M

Reenabling refresh ...

Verifying attacked memory ... ................................ row_errors for row= 53: 3 row_errors for row= 55: 3 row_errors for row= 132: 41 row_errors for row= 134: 28 Matplotlib is building the font cache; this may take a moment.

On Wed, Oct 28, 2020 at 10:54 AM Salman Qazi sqazi@google.com wrote:

Almost there:

(venv) sqazi@sqazi-glaptop:~/fpga/litex-rowhammer-tester/scripts$ python ./rowhammer.py --nrows 512 --read_count 10e6 --pattern 01_in_row --row-pairs const --const-rows-pair 54 133 --plot --no-refresh

Preparing ...

Filling memory with data ... ................................ Disabling refresh ...

Running row hammer attacks ... Iter 0 / 1 Rows = ( 54, 133), Count = 10.07M / 10.00M

Reenabling refresh ...

Verifying attacked memory ... ................................ row_errors for row= 53: 3 row_errors for row= 55: 3 row_errors for row= 132: 42 row_errors for row= 134: 28 Traceback (most recent call last): File "./rowhammer.py", line 290, in row_hammer.run(row_pairs=row_pairs, read_count=args.read_count, pattern_generator=pattern) File "./rowhammer.py", line 198, in run self.display_errors(errors) File "./rowhammer.py", line 151, in display_errors from matplotlib import pyplot as plt ModuleNotFoundError: No module named 'matplotlib'

I tried installing python3_matplotlib (outside the environment) but it didn't help. Any suggestions?

On Wed, Oct 28, 2020 at 5:20 AM Jędrzej Boczar notifications@github.com wrote:

@yoongu https://github.com/yoongu I verified that the pins are mapped correctly. Just for reference, the RTL mappings for Arty are defined here https://github.com/enjoy-digital/litex/blob/517a49253f8c0aff48924aa957acb8c0ab6e9766/litex/boards/platforms/arty.py#L101-L104 and the pin names go from LSB to MSB.

I think we can now set up the environment on your side and try to reproduce the results. The current version on the master branch should work without issues. The setup and usage instructions are in the README https://github.com/antmicro/litex-rowhammer-tester/#installing-dependencies. If anything is unclear just let me know here or open an issue.

If you have access to the Arty A7 board you can try to reproduce the results which I posted earlier. If not it would be good to try setting up the simulation, which will be helpful when testing Payload Execution. The simulator will print all the commands registered on PHY's DFI interface.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/antmicro/litex-rowhammer-tester/issues/3#issuecomment-717896120, or unsubscribe https://github.com/notifications/unsubscribe-auth/AABOQAK7YTDHJXIUZYNC2J3SNAD77ANCNFSM4SZX2M2Q .

yoongu commented 4 years ago

Looks like we're able to reproduce it on our side. Thanks @jedrzejboczar and @sqazi. My Arty is currently on the way.

sqazi commented 4 years ago

Another nice-to-have would be to somehow isolate IP address/ MAC address so that they are easy to modify. Currently, I am changing the IP address in two places: in the Makefile and in arty.py to work with my network setup.

On Wed, Oct 28, 2020 at 11:13 AM Salman Qazi sqazi@google.com wrote:

Regarding clarity, I would say that it wasn't clear to me that the environment had to be entered to run the scripts and I spend a bunch of time fumbling there.

On Wed, Oct 28, 2020 at 11:10 AM Salman Qazi sqazi@google.com wrote:

Nevermind, I got it. See attached figure.

$ python ./rowhammer.py --nrows 512 --read_count 10e6 --pattern 01_in_row --row-pairs const --const-rows-pair 54 133 --plot --no-refresh

Preparing ...

Filling memory with data ... ................................ Disabling refresh ...

Running row hammer attacks ... Iter 0 / 1 Rows = ( 54, 133), Count = 10.03M / 10.00M

Reenabling refresh ...

Verifying attacked memory ... ................................ row_errors for row= 53: 3 row_errors for row= 55: 3 row_errors for row= 132: 41 row_errors for row= 134: 28 Matplotlib is building the font cache; this may take a moment.

On Wed, Oct 28, 2020 at 10:54 AM Salman Qazi sqazi@google.com wrote:

Almost there:

(venv) sqazi@sqazi-glaptop:~/fpga/litex-rowhammer-tester/scripts$ python ./rowhammer.py --nrows 512 --read_count 10e6 --pattern 01_in_row --row-pairs const --const-rows-pair 54 133 --plot --no-refresh

Preparing ...

Filling memory with data ... ................................ Disabling refresh ...

Running row hammer attacks ... Iter 0 / 1 Rows = ( 54, 133), Count = 10.07M / 10.00M

Reenabling refresh ...

Verifying attacked memory ... ................................ row_errors for row= 53: 3 row_errors for row= 55: 3 row_errors for row= 132: 42 row_errors for row= 134: 28 Traceback (most recent call last): File "./rowhammer.py", line 290, in row_hammer.run(row_pairs=row_pairs, read_count=args.read_count, pattern_generator=pattern) File "./rowhammer.py", line 198, in run self.display_errors(errors) File "./rowhammer.py", line 151, in display_errors from matplotlib import pyplot as plt ModuleNotFoundError: No module named 'matplotlib'

I tried installing python3_matplotlib (outside the environment) but it didn't help. Any suggestions?

On Wed, Oct 28, 2020 at 5:20 AM Jędrzej Boczar notifications@github.com wrote:

@yoongu https://github.com/yoongu I verified that the pins are mapped correctly. Just for reference, the RTL mappings for Arty are defined here https://github.com/enjoy-digital/litex/blob/517a49253f8c0aff48924aa957acb8c0ab6e9766/litex/boards/platforms/arty.py#L101-L104 and the pin names go from LSB to MSB.

I think we can now set up the environment on your side and try to reproduce the results. The current version on the master branch should work without issues. The setup and usage instructions are in the README https://github.com/antmicro/litex-rowhammer-tester/#installing-dependencies. If anything is unclear just let me know here or open an issue.

If you have access to the Arty A7 board you can try to reproduce the results which I posted earlier. If not it would be good to try setting up the simulation, which will be helpful when testing Payload Execution. The simulator will print all the commands registered on PHY's DFI interface.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/antmicro/litex-rowhammer-tester/issues/3#issuecomment-717896120, or unsubscribe https://github.com/notifications/unsubscribe-auth/AABOQAK7YTDHJXIUZYNC2J3SNAD77ANCNFSM4SZX2M2Q .

yoongu commented 4 years ago

@sqazi Feel free to file issues.

mithro commented 4 years ago

@sqazi -- That reminds me I have some very old code I never finished which adds DHCP support to LiteEth...

jedrzejboczar commented 4 years ago

@sqazi Thanks for feedback. I updated the README to be more explicit about the virtual environment and made IP/MAC configurable via Makefile variables.

sqazi commented 4 years ago

Thanks Jędrzej

On Thu, Oct 29, 2020 at 2:22 AM Jędrzej Boczar notifications@github.com wrote:

@sqazi https://github.com/sqazi Thanks for feedback. I updated the README to be more explicit about the virtual environment and made IP/MAC configurable via Makefile variables.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/antmicro/litex-rowhammer-tester/issues/3#issuecomment-718530782, or unsubscribe https://github.com/notifications/unsubscribe-auth/AABOQAJZKDRMAVSNO4FPMW3SNEX6LANCNFSM4SZX2M2Q .