Closed jedrzejboczar closed 4 years ago
Thanks for the update! Sounds like you guys are making good progress.
I'd start with just a single aggressor: issuing ACT + PRE to the same row over and over again. It seems like you can toggle auto-precharge, so this can be done using ACT + READ (w/ auto-precharge). The intervening READ doesn't matter at all.
Can you guys also control the refresh interval, or even turn it off? I'd set it to a very large value (e.g., 64ms -> 1sec) and repeat the experiment. Here you'll get retention errors as well, but those should be randomly distributed whereas the rowhammer errors should be concentrated around the row you're activating.
It's hard to recommend a particular data pattern, but solid (all ones or all zeros) or striped (ones/zeros in even/odd, ones/zeros in odd/even) are good bets.
The first thousand rows on a single bank sounds like a good sample population. Just make sure that you read out the entire bank when checking for errors because we don't yet know which rows are adjacent to what.
Yes, we can easily change the refresh interval or disable the refresh entirely. I'll run the tests with your suggestion and let you know when I have some results.
Thanks Jedrzej. One question -- why is EtherBone so slow? 1MB/22s = 45KB/s. I was expecting a bandwidth that's two orders of magnitude larger. If reading out the module takes an hour, that's going to really slow down experiments. Will this get faster for future implementations and/or for higher-speed FPGA boards?
Hi @yoongu, I'm split the wishbone bridge question into a seperate GitHub issue #4 so we can keep this issue discussion to replicating Row Hammer stuff itself. Will answer your question there.
Comments from Salman,
Salman Qazi, 9:04 AM Couple of things right off the bat. First, traditional row hammer, when there are no DRAM mitigations, requires attacking rows with a gap of 1. In other words, you don't hammer rows 0 and 1, you hammer rows 0 and 2. Secondly, for data pattern, the optimal pattern is to have the victim row (in this case row 1) be the inverse pattern of the aggressor rows (0 and 2).
Salman Qazi, 9:08 AM (this is for double-sided hammer, which is somewhat more effective than single sided hammer)
I read the later part of the bug, and I saw that you guys did figure out that alternating pattern for the rows. Awesome.
Since we don't have row adjacency information, it'll be hard to stage a double-sided attack. There could be unintentional address swizzling that's happening on the FPGA board.
Disabling refreshes and pummeling a single row a million times should do the trick. In DDR3, it is known that several hundred thousand activations will flip a bit. Not for all rows, but for a significant portion of them.
I performed some attacks with refresh disabled and was able to see the expected behavior. In the tests we were attacking only a single pair of rows 1000M times. The pattern was 010101... in each row.
Results for the pair (32, 34):
Results for the pair (99, 200) (plot zoomed in):
I believe that these results show that the rows are placed sequentially (at least the tested ones) as errors happen in the adjacent row numbers.
Since we don't have row adjacency information, it'll be hard to stage a double-sided attack. There could be unintentional address swizzling that's happening on the FPGA board.
Disabling refreshes and pummeling a single row a million times should do the trick. In DDR3, it is known that several hundred thousand activations will flip a bit. Not for all rows, but for a significant portion of them.
In the LiteDRAM controller there is no address/data swizzling, so that shouldn't be an issue. The only thing that the controller does is to map the address space bits as ROW_BANK_COL, but no additional changes are being done.
Woot! This is great news, thanks Jedrzej.
Regarding swizzling, have you guys checked the Arty board traces too? Actually, if you could link/upload the board schematic that'd be great.
@yoongu you can find the schematic here https://reference.digilentinc.com/_media/reference/programmable-logic/arty/arty_sch.pdf
the lines don't seem to be swizzled
Could you also double-check that the bus-numbers in the RTL match those from the FPGA pin-out? For example, the ADDR[0] should map to R2, and so on and so forth.
What is the smallest number of hammers required to get an above-noise level of bit flips?
What is the smallest number of hammers required to get an above-noise level of bit flips?
I just did a quick test with only 5M row toggles, on a single pair of rows (54, 133) and with refresh disabled only during the attack. This resulted in 3 errors for row 55, 1 error for rows (53, 132, 134) and no errors for all other rows (was testing only the first 512 rows).
@yoongu I verified that the pins are mapped correctly. Just for reference, the RTL mappings for Arty are defined here and the pin names go from LSB to MSB.
I think we can now set up the environment on your side and try to reproduce the results. The current version on the master
branch should work without issues. The setup and usage instructions are in the README. If anything is unclear just let me know here or open an issue.
If you have access to the Arty A7 board you can try to reproduce the results which I posted earlier. If not it would be good to try setting up the simulation, which will be helpful when testing Payload Execution. The simulator will print all the commands registered on PHY's DFI interface.
Almost there:
(venv) sqazi@sqazi-glaptop:~/fpga/litex-rowhammer-tester/scripts$ python ./rowhammer.py --nrows 512 --read_count 10e6 --pattern 01_in_row --row-pairs const --const-rows-pair 54 133 --plot --no-refresh
Preparing ...
Filling memory with data ... ................................ Disabling refresh ...
Running row hammer attacks ... Iter 0 / 1 Rows = ( 54, 133), Count = 10.07M / 10.00M
Reenabling refresh ...
Verifying attacked memory ...
................................
row_errors for row= 53: 3
row_errors for row= 55: 3
row_errors for row= 132: 42
row_errors for row= 134: 28
Traceback (most recent call last):
File "./rowhammer.py", line 290, in
I tried installing python3_matplotlib (outside the environment) but it didn't help. Any suggestions?
On Wed, Oct 28, 2020 at 5:20 AM Jędrzej Boczar notifications@github.com wrote:
@yoongu https://github.com/yoongu I verified that the pins are mapped correctly. Just for reference, the RTL mappings for Arty are defined here https://github.com/enjoy-digital/litex/blob/517a49253f8c0aff48924aa957acb8c0ab6e9766/litex/boards/platforms/arty.py#L101-L104 and the pin names go from LSB to MSB.
I think we can now set up the environment on your side and try to reproduce the results. The current version on the master branch should work without issues. The setup and usage instructions are in the README https://github.com/antmicro/litex-rowhammer-tester/#installing-dependencies. If anything is unclear just let me know here or open an issue.
If you have access to the Arty A7 board you can try to reproduce the results which I posted earlier. If not it would be good to try setting up the simulation, which will be helpful when testing Payload Execution. The simulator will print all the commands registered on PHY's DFI interface.
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/antmicro/litex-rowhammer-tester/issues/3#issuecomment-717896120, or unsubscribe https://github.com/notifications/unsubscribe-auth/AABOQAK7YTDHJXIUZYNC2J3SNAD77ANCNFSM4SZX2M2Q .
Nevermind, I got it. See attached figure.
$ python ./rowhammer.py --nrows 512 --read_count 10e6 --pattern 01_in_row --row-pairs const --const-rows-pair 54 133 --plot --no-refresh
Preparing ...
Filling memory with data ... ................................ Disabling refresh ...
Running row hammer attacks ... Iter 0 / 1 Rows = ( 54, 133), Count = 10.03M / 10.00M
Reenabling refresh ...
Verifying attacked memory ... ................................ row_errors for row= 53: 3 row_errors for row= 55: 3 row_errors for row= 132: 41 row_errors for row= 134: 28 Matplotlib is building the font cache; this may take a moment.
On Wed, Oct 28, 2020 at 10:54 AM Salman Qazi sqazi@google.com wrote:
Almost there:
(venv) sqazi@sqazi-glaptop:~/fpga/litex-rowhammer-tester/scripts$ python ./rowhammer.py --nrows 512 --read_count 10e6 --pattern 01_in_row --row-pairs const --const-rows-pair 54 133 --plot --no-refresh
Preparing ...
Filling memory with data ... ................................ Disabling refresh ...
Running row hammer attacks ... Iter 0 / 1 Rows = ( 54, 133), Count = 10.07M / 10.00M
Reenabling refresh ...
Verifying attacked memory ... ................................ row_errors for row= 53: 3 row_errors for row= 55: 3 row_errors for row= 132: 42 row_errors for row= 134: 28 Traceback (most recent call last): File "./rowhammer.py", line 290, in
row_hammer.run(row_pairs=row_pairs, read_count=args.read_count, pattern_generator=pattern) File "./rowhammer.py", line 198, in run self.display_errors(errors) File "./rowhammer.py", line 151, in display_errors from matplotlib import pyplot as plt ModuleNotFoundError: No module named 'matplotlib' I tried installing python3_matplotlib (outside the environment) but it didn't help. Any suggestions?
On Wed, Oct 28, 2020 at 5:20 AM Jędrzej Boczar notifications@github.com wrote:
@yoongu https://github.com/yoongu I verified that the pins are mapped correctly. Just for reference, the RTL mappings for Arty are defined here https://github.com/enjoy-digital/litex/blob/517a49253f8c0aff48924aa957acb8c0ab6e9766/litex/boards/platforms/arty.py#L101-L104 and the pin names go from LSB to MSB.
I think we can now set up the environment on your side and try to reproduce the results. The current version on the master branch should work without issues. The setup and usage instructions are in the README https://github.com/antmicro/litex-rowhammer-tester/#installing-dependencies. If anything is unclear just let me know here or open an issue.
If you have access to the Arty A7 board you can try to reproduce the results which I posted earlier. If not it would be good to try setting up the simulation, which will be helpful when testing Payload Execution. The simulator will print all the commands registered on PHY's DFI interface.
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/antmicro/litex-rowhammer-tester/issues/3#issuecomment-717896120, or unsubscribe https://github.com/notifications/unsubscribe-auth/AABOQAK7YTDHJXIUZYNC2J3SNAD77ANCNFSM4SZX2M2Q .
Regarding clarity, I would say that it wasn't clear to me that the environment had to be entered to run the scripts and I spend a bunch of time fumbling there.
On Wed, Oct 28, 2020 at 11:10 AM Salman Qazi sqazi@google.com wrote:
Nevermind, I got it. See attached figure.
$ python ./rowhammer.py --nrows 512 --read_count 10e6 --pattern 01_in_row --row-pairs const --const-rows-pair 54 133 --plot --no-refresh
Preparing ...
Filling memory with data ... ................................ Disabling refresh ...
Running row hammer attacks ... Iter 0 / 1 Rows = ( 54, 133), Count = 10.03M / 10.00M
Reenabling refresh ...
Verifying attacked memory ... ................................ row_errors for row= 53: 3 row_errors for row= 55: 3 row_errors for row= 132: 41 row_errors for row= 134: 28 Matplotlib is building the font cache; this may take a moment.
On Wed, Oct 28, 2020 at 10:54 AM Salman Qazi sqazi@google.com wrote:
Almost there:
(venv) sqazi@sqazi-glaptop:~/fpga/litex-rowhammer-tester/scripts$ python ./rowhammer.py --nrows 512 --read_count 10e6 --pattern 01_in_row --row-pairs const --const-rows-pair 54 133 --plot --no-refresh
Preparing ...
Filling memory with data ... ................................ Disabling refresh ...
Running row hammer attacks ... Iter 0 / 1 Rows = ( 54, 133), Count = 10.07M / 10.00M
Reenabling refresh ...
Verifying attacked memory ... ................................ row_errors for row= 53: 3 row_errors for row= 55: 3 row_errors for row= 132: 42 row_errors for row= 134: 28 Traceback (most recent call last): File "./rowhammer.py", line 290, in
row_hammer.run(row_pairs=row_pairs, read_count=args.read_count, pattern_generator=pattern) File "./rowhammer.py", line 198, in run self.display_errors(errors) File "./rowhammer.py", line 151, in display_errors from matplotlib import pyplot as plt ModuleNotFoundError: No module named 'matplotlib' I tried installing python3_matplotlib (outside the environment) but it didn't help. Any suggestions?
On Wed, Oct 28, 2020 at 5:20 AM Jędrzej Boczar notifications@github.com wrote:
@yoongu https://github.com/yoongu I verified that the pins are mapped correctly. Just for reference, the RTL mappings for Arty are defined here https://github.com/enjoy-digital/litex/blob/517a49253f8c0aff48924aa957acb8c0ab6e9766/litex/boards/platforms/arty.py#L101-L104 and the pin names go from LSB to MSB.
I think we can now set up the environment on your side and try to reproduce the results. The current version on the master branch should work without issues. The setup and usage instructions are in the README https://github.com/antmicro/litex-rowhammer-tester/#installing-dependencies. If anything is unclear just let me know here or open an issue.
If you have access to the Arty A7 board you can try to reproduce the results which I posted earlier. If not it would be good to try setting up the simulation, which will be helpful when testing Payload Execution. The simulator will print all the commands registered on PHY's DFI interface.
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/antmicro/litex-rowhammer-tester/issues/3#issuecomment-717896120, or unsubscribe https://github.com/notifications/unsubscribe-auth/AABOQAK7YTDHJXIUZYNC2J3SNAD77ANCNFSM4SZX2M2Q .
Looks like we're able to reproduce it on our side. Thanks @jedrzejboczar and @sqazi. My Arty is currently on the way.
Another nice-to-have would be to somehow isolate IP address/ MAC address so that they are easy to modify. Currently, I am changing the IP address in two places: in the Makefile and in arty.py to work with my network setup.
On Wed, Oct 28, 2020 at 11:13 AM Salman Qazi sqazi@google.com wrote:
Regarding clarity, I would say that it wasn't clear to me that the environment had to be entered to run the scripts and I spend a bunch of time fumbling there.
On Wed, Oct 28, 2020 at 11:10 AM Salman Qazi sqazi@google.com wrote:
Nevermind, I got it. See attached figure.
$ python ./rowhammer.py --nrows 512 --read_count 10e6 --pattern 01_in_row --row-pairs const --const-rows-pair 54 133 --plot --no-refresh
Preparing ...
Filling memory with data ... ................................ Disabling refresh ...
Running row hammer attacks ... Iter 0 / 1 Rows = ( 54, 133), Count = 10.03M / 10.00M
Reenabling refresh ...
Verifying attacked memory ... ................................ row_errors for row= 53: 3 row_errors for row= 55: 3 row_errors for row= 132: 41 row_errors for row= 134: 28 Matplotlib is building the font cache; this may take a moment.
On Wed, Oct 28, 2020 at 10:54 AM Salman Qazi sqazi@google.com wrote:
Almost there:
(venv) sqazi@sqazi-glaptop:~/fpga/litex-rowhammer-tester/scripts$ python ./rowhammer.py --nrows 512 --read_count 10e6 --pattern 01_in_row --row-pairs const --const-rows-pair 54 133 --plot --no-refresh
Preparing ...
Filling memory with data ... ................................ Disabling refresh ...
Running row hammer attacks ... Iter 0 / 1 Rows = ( 54, 133), Count = 10.07M / 10.00M
Reenabling refresh ...
Verifying attacked memory ... ................................ row_errors for row= 53: 3 row_errors for row= 55: 3 row_errors for row= 132: 42 row_errors for row= 134: 28 Traceback (most recent call last): File "./rowhammer.py", line 290, in
row_hammer.run(row_pairs=row_pairs, read_count=args.read_count, pattern_generator=pattern) File "./rowhammer.py", line 198, in run self.display_errors(errors) File "./rowhammer.py", line 151, in display_errors from matplotlib import pyplot as plt ModuleNotFoundError: No module named 'matplotlib' I tried installing python3_matplotlib (outside the environment) but it didn't help. Any suggestions?
On Wed, Oct 28, 2020 at 5:20 AM Jędrzej Boczar notifications@github.com wrote:
@yoongu https://github.com/yoongu I verified that the pins are mapped correctly. Just for reference, the RTL mappings for Arty are defined here https://github.com/enjoy-digital/litex/blob/517a49253f8c0aff48924aa957acb8c0ab6e9766/litex/boards/platforms/arty.py#L101-L104 and the pin names go from LSB to MSB.
I think we can now set up the environment on your side and try to reproduce the results. The current version on the master branch should work without issues. The setup and usage instructions are in the README https://github.com/antmicro/litex-rowhammer-tester/#installing-dependencies. If anything is unclear just let me know here or open an issue.
If you have access to the Arty A7 board you can try to reproduce the results which I posted earlier. If not it would be good to try setting up the simulation, which will be helpful when testing Payload Execution. The simulator will print all the commands registered on PHY's DFI interface.
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/antmicro/litex-rowhammer-tester/issues/3#issuecomment-717896120, or unsubscribe https://github.com/notifications/unsubscribe-auth/AABOQAK7YTDHJXIUZYNC2J3SNAD77ANCNFSM4SZX2M2Q .
@sqazi Feel free to file issues.
@sqazi -- That reminds me I have some very old code I never finished which adds DHCP support to LiteEth...
@sqazi Thanks for feedback. I updated the README to be more explicit about the virtual environment and made IP/MAC configurable via Makefile variables.
Thanks Jędrzej
On Thu, Oct 29, 2020 at 2:22 AM Jędrzej Boczar notifications@github.com wrote:
@sqazi https://github.com/sqazi Thanks for feedback. I updated the README to be more explicit about the virtual environment and made IP/MAC configurable via Makefile variables.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/antmicro/litex-rowhammer-tester/issues/3#issuecomment-718530782, or unsubscribe https://github.com/notifications/unsubscribe-auth/AABOQAJZKDRMAVSNO4FPMW3SNEX6LANCNFSM4SZX2M2Q .
I prepared a basic setup for running a Row Hammer attack using EtherBone to fill/check the memory and a simple DMA reader module. I did several tests on the Arty A7 board but was not able to induce any errors in the memory.
Testing procedure
The attack procedure is as follows:
address0=address(row=row0, col=512)
andaddress1=address(row=row1, col=512)
and run alternating DMA reads from these addresses.The pattern, the number of rows, the number of row toggles per rows pair and the choice of row pairs were being changed in different test cases.
Test cases
The initial tests were being run on 1024 rows, attacked in pairs (2n, 2n+1), so (0,1), (2,3), (4,5), ... The pattern used for filling the rows was just 0x55555555, so alternating 1s and 0s in a single row. For each pair there were ~10M row toggles.
Later I did some tests with similar number of rows with the following modifications (added one by one):
Finally I ran longer tests on all 16384 rows of the DRAM chip and ~1.5M row toggles per rows pair (took slightly over 1h per run):
For none of the tests I could identify any error.
I also verified that the reading procedure actually identifies errors by flipping a single bit manually.
Verification in simulation
To verify the correctness of our system I have run it with a simulation model of DRAM in Verilator. I was able to see correct command sequences in waveform dumps. Then I configured the model to log each DFI command that is being sent, printing time/bank/row/column/etc. I prepared a wrapper script that parses all the information and calculates the performance of our DMA reads.
In our setup we run at DDR3-800, with a refresh period of tREF=64ms and refresh command interval of tREFI=64ms/8192. The wrapper script counts the number of row toggles (i.e. of ACT commands) between two REF commands. From my tests we have a median of 93 row toggles between two subsequent REF commands. This gives 764k toggles per tREF, so before that particular row is being refreshed. Following, we have 11.9M toggles per second, which is in the range of frequencies that have been used in "4. Real System Demonstration" of the paper (11.6M, 11.7M, 12.3M and 6.1M).
Side notes
We are currently doing the tests on a single bank. The Row Hammer module could be extended to support attacking all banks at once, which should slightly increase the performance.
During the tests I was able to measure the speed at which we can currently write/read the DRAM memory over EtherBone. From my tests reading 1MB takes ~22 seconds, so reading the whole memory (256MB) should take ~1.5h. Writing is several times faster.
As for the time taken by the above tests on 16384 rows, we were using only 1 bank our of 8, so reading should take 0.1875h, which I measured and it did. Then each attack with ~1.5M toggles took ~0.16 seconds (1.5M/11.9M ~= 0.13s, so a bit of overhead there). Multiplying 0.16s*16833 = 0.728h. So adding everything together (with ~1min of writing) we have around 1h 10min.
Are there any obvious mistakes in the testing methodology used? Do we need better performance or different access patterns? Or is it possible that the DRAM chip we use (MT41K128M16) is not vulnerable?