HyperRAM: 16-bit write/read not stable

sy2002 commented 4 years ago

Start at address 0x00333333 and fill 1MB of data with 16-bit values starting from 0x0000 and then count them up by one at every address, so:

Address    ==> Value
0x00333333 ==> 0x0000
0x00333331 ==> 0x0001
0x00333332 ==> 0x0002
[...]
0x00343332 ==> 0xFFFF
0x00343333 ==> 0x000
[...]

After filling 1MB like this, the code starts to read everything back. If something goes wrong, then the address where this happens, the actual value and the expected value is printed.

STEP 1: Run from a freshly programmed FPGA (means: HRAM driver and wrapper fresh)

The very first 255 words of the 16-bit part work. But then word number 256 and 257 have a problem. And from there on, every 256 words of the 16-bit part has errors:

16-bit linear test: Fill 1MB from 0x0333333 with 16-bit values, then test:
16-bit linear test: FAILED
Read Errors: 0x0198 (decimal: 408)
Address   Actual  Expected
00333433: 0000    0100
00333434: 0001    0101
00333533: 0100    0200
00333534: 0101    0201
00333633: 0200    0300
00333634: 0201    0301
00333733: 0300    0400
00333734: 0301    0401
00333833: 0400    0500
00333834: 0401    0501
00333933: 0500    0600
00333934: 0501    0601
00333A33: 0600    0700
00333A34: 0601    0701
00333B33: 0700    0800
00333B34: 0701    0801
00333C33: 0800    0900
00333C34: 0801    0901
00333D33: 0900    0A00
00333D34: 0901    0A01
00333E33: 0A00    0B00
00333E34: 0A01    0B01
[ ... output shortened, but the pattern continues every 256 bytes ...]
0033FC33: C800    C900
0033FC34: C801    C901
0033FD33: C900    CA00
0033FD34: C901    CA01
0033FE33: CA00    CB00
0033FE34: CA01    CB01
0033FF33: CB00    CC00
0033FF34: CB01    CC01

STEP 2: Run the program again (without re-programming the FPGA)

Similar result, but there are two more errors (410 errors instead of 408) and those two additional errors are at the very beginning.

16-bit linear test: Fill 1MB from 0x0333333 with 16-bit values, then test:
16-bit linear test: FAILED
Read Errors: 0x019A (decimal: 410)
Address   Actual  Expected
00333333: CC00    0000
00333334: CC01    0001
00333433: 0000    0100
00333434: 0001    0101
00333533: 0100    0200
00333534: 0101    0201
00333633: 0200    0300
00333634: 0201    0301
00333733: 0300    0400
00333734: 0301    0401
00333833: 0400    0500
00333834: 0401    0501
[ ... output shortened, but the pattern continues every 256 bytes ...]
0033FE33: CA00    CB00
0033FE34: CA01    CB01
0033FF33: CB00    CC00
0033FF34: CB01    CC01

gardners commented 4 years ago

So this looks to me like the same problem in both cases: Writing to the first word of a 256-word page is wrapping and ending up in the wrong place.

First step would be to put this write pattern into the HyperRAM 16-bit access test bench in VHDL, and see if it shows up under simulation.

gardners commented 4 years ago

USed the above to test, and while I could not reproduce the EXACT problem, I did see something perhaps similar enough:

src/vhdl/test_hyperram16.vhdl:1866:15:@245274ns:(report note): DISPATCHER: Writing to $8666C5C <- $02FB
src/vhdl/test_hyperram16.vhdl:1866:15:@245286ns:(report note): DISPATCHER: Writing to $8666C5E <- $02FC
src/vhdl/test_hyperram16.vhdl:1866:15:@245298ns:(report note): DISPATCHER: Writing to $8666C60 <- $02FD
src/vhdl/test_hyperram16.vhdl:1866:15:@245310ns:(report note): DISPATCHER: Writing to $8666C62 <- $02FE
src/vhdl/test_hyperram16.vhdl:1866:15:@245322ns:(report note): DISPATCHER: Writing to $8666C64 <- $02FF
src/vhdl/test_hyperram16.vhdl:1861:15:@245526ns:(report note): DISPATCHER: Reading from $8666666, expecting to see $0000
src/vhdl/test_hyperram16.vhdl:1818:13:@246138ns:(report note): DISPATCHER: Read correct value $0000 after 612ns.
src/vhdl/test_hyperram16.vhdl:1861:15:@246162ns:(report note): DISPATCHER: Reading from $8666668, expecting to see $0001
src/vhdl/test_hyperram16.vhdl:1821:13:@246354ns:(report note): DISPATCHER: ERROR: Expected $0001, but saw $FFFF after 192ns.
src/vhdl/test_hyperram16.vhdl:1861:15:@246462ns:(report note): DISPATCHER: Reading from $866666A, expecting to see $0002
src/vhdl/test_hyperram16.vhdl:1821:13:@246486ns:(report note): DISPATCHER: ERROR: Expected $0002, but saw $FFFF after 24ns.
src/vhdl/test_hyperram16.vhdl:1861:15:@246498ns:(report note): DISPATCHER: Reading from $866666C, expecting to see $0003
src/vhdl/test_hyperram16.vhdl:1821:13:@246522ns:(report note): DISPATCHER: ERROR: Expected $0003, but saw $FFFF after 24ns.
src/vhdl/test_hyperram16.vhdl:1861:15:@246534ns:(report note): DISPATCHER: Reading from $866666E, expecting to see $0004
src/vhdl/test_hyperram16.vhdl:1821:13:@246558ns:(report note): DISPATCHER: ERROR: Expected $0004, but saw $FFFF after 24ns.
src/vhdl/test_hyperram16.vhdl:1861:15:@246570ns:(report note): DISPATCHER: Reading from $8666670, expecting to see $0005
src/vhdl/test_hyperram16.vhdl:1818:13:@246786ns:(report note): DISPATCHER: Read correct value $0005 after 216ns.
src/vhdl/test_hyperram16.vhdl:1861:15:@246894ns:(report note): DISPATCHER: Reading from $8666672, expecting to see $0006
src/vhdl/test_hyperram16.vhdl:1818:13:@246918ns:(report note): DISPATCHER: Read correct value $0006 after 24ns.

So here we are seeing four words reading back junk, and it happens even if I do this write-read loop a 2nd time. The write doesn't seem to happen, or writes the wrong data.

gardners commented 4 years ago

Inserting a write to somewhere else just before this long batch of writes fixes the first word, but the following 3 words are still wrong.

gardners commented 4 years ago

Correction: The following 4 words are wrong, identically to without this. It's one whole 8-byte cache row that is wrong, it seems.

gardners commented 4 years ago

Ok, I think the problem is that when a read pre-fetch is aborted due to a conflicting read, that it doesn't set the correct address for the new read to the HyperRAM.

There is a separate performance issue that I could also look at, where under certain circumstances writes to successive words don't get merged into a single block write. But that will only result in slower writing, not bugs like you are seeing.

sy2002 commented 4 years ago

Thank you for analysis Paul. Sounds good!

gardners commented 4 years ago

Actually, the problem is that when we abort a read pre-fetch, we somehow end up sending the first byte of the command bytes to the hyperram twice, e.g., A0 A0 00 66 66 00 instead of A0 00 66 66 00 04. Hilarity then ensues...

gardners commented 4 years ago

We could disable the cancelling of read fetches. This would make linear reads a little faster on average, at the cost of an increase in maximum latency. But better to find and fix the root cause first...

sy2002 commented 4 years ago

Yes, root cause sounds much better. Take your time. I am on a vacation now with my family and not back before September 7th :-)

gardners commented 4 years ago

Problem found and believed fixed:

When aborting a prefetch read operation, the clock phase shift has to be set back to that for commands. This causes an extra clock edge to be produced when the controller setups the CA address data, causing the first byte of that to get latched twice.

Please try it now, and let me know the result.

sy2002 commented 4 years ago

Thank you Paul for this ultra fast fix! After being back at a desk with MEGA65 hardware (September, 7th) I will immediatelly test :-)

sy2002 commented 4 years ago

Unfortunatelly, this did not fix it:

I tested it and the results are exactly the same as the ones described above in https://github.com/MEGA65/mega65-core/issues/280#issue-683859002

Even the behaviour "newly configured FPGA" (step 1) vs. "running the test twice" (step 2).

gardners commented 4 years ago

Hmm.. It is quite strange that it is behaving exactly the same after the fix. You did grab the hyperram source files from the head of 138-hdmi-audio-27mhz, not a different branch, right?

sy2002 commented 4 years ago

Yes, I did. I even did a file compare and synthesized twice, because I also at first thought that I made a mistake.

sy2002 commented 4 years ago

You are right, it seems I mixed up the 138-hdmi-audio-27mhz branch with the 138-hdmi-audio branch (without the 27mhz at the end):

I have tested the wrong file.

Retesting now.

sy2002 commented 4 years ago

OK did double check now, with the 100% correct file.

Problem is still here as described in https://github.com/MEGA65/mega65-core/issues/280#issue-683859002

(I always had the right file, already in my initial test, ignore my last comment - sorry for the confusion)

sy2002 commented 4 years ago

There is Progress:

Using the newest file

wget https://raw.githubusercontent.com/MEGA65/mega65-core/138-hdmi-audio-27mhz/src/vhdl/hyperram.vhdl

which does nothing else differently than setting this signal to zero:

  signal block_read_enable : std_logic := '0'; -- disable 32 byte read block fetching

leads to the following result:

Run 1: Freshly configured FPGA

Freshly configured FPGA means: I am programming a "cold started" FPGA with a fresh bitstream and then load my test program:

The linear write/read test works like a charm! 👍

Run 2 .. n: Running the test program again and again

"Again and again" means: Re-running the test program again and again after the successful run 1. The 1MB write/read-back works mostly fine, but the very first two writes/reads are failing:

8-bit test: PASSED

16-bit linear test: Fill 1MB from 0x0333333 with 16-bit values, then test:
16-bit linear test: FAILED
Read Errors: 0002
Address   Actual  Expected
00333333: CC00    0000
00333334: CC01    0001

gardners commented 4 years ago

How small a fill can you do, where repeating it still stuffs up? I am keen to reduce this to the absolute minimum case.

gardners commented 4 years ago

Meanwhile, maybe try disabling all of the other cache things, and see if that works, and then we can reenable each in turn. Send me the repository link again together with the test procedure you are using, and maybe I can try doing that from here.

sy2002 commented 4 years ago

OK - I will try the disabling of the other cache things and prepare the testbed, so that you can use it for example to attach Xilinx ILA / Chipscope. I will try to contact you on Skype later.

sy2002 commented 4 years ago

@gardners about this question:

How small a fill can you do, where repeating it still stuffs up? I am keen to reduce this to the absolute minimum case.

I did the experiments and here is the result:

I can only reproduce it, when a wrap around happens on the high word:

I am starting at 0x0333333 (which means 0x0666666, because in 16bit mode I am multiplying the address by 2, before handing it over to your controller)
When the high word wraps around from 0x33 to 0x34, then the problem happens. That means after 0xCCCD (52429 decimal) words written. In contrast to the 1 MB test, this is a much smaller number :-)
I assume the number 52429 has no meaning. But the fact that the high word wraps from 0x33 to 0x34 (meaning from 0x66 to 68) might mean something to you?

So for summing up: My original test ran from 0x333333 to 0x433333, which equals 1 MB. My new test runs from 0x333333 to 0x340000 which equals 52429.

About your second question: Yes, you can play around with it by yourself. I created a new branch in the QNICE repository called dev-hram. You can synthesize using hw/xilinx/MEGA65/Vivado - but I assume I need to give you some mini introduction, before you might be able to attach ILA / Chipscope to it. I will contact you on Skype.

sy2002 commented 4 years ago

I found out some more interessting stuff:

As described above: When doing 1 MB from 0x333333 to 0x433333, then in the first run it works, in subsequent it does not and I do get two errors.
When going from 0x333333 to 0x340000, which is only 52429 words, then the behavior is identically to the 1 MB case: two errors, but only on subsequent runs. In the first run it works.
Now, fasten your seatbelt: But when I go from 0x33E000 to 0x340010, then it already fails at the first run and I do get 31 errors already on the first run. Here they are

16-bit linear test: FAILED
Read Errors: 001F
Address   Actual  Expected
0033E100: 0000    0100
0033E200: 0100    0200
0033E300: 0200    0300
0033E400: 0300    0400
0033E500: 0400    0500
0033E600: 0500    0600
0033E700: 0600    0700
0033E800: 0700    0800
0033E900: 0800    0900
0033EA00: 0900    0A00
0033EB00: 0A00    0B00
0033EC00: 0B00    0C00
0033ED00: 0C00    0D00
0033EE00: 0D00    0E00
0033EF00: 0E00    0F00
0033F000: 0F00    1000
0033F100: 1000    1100
0033F200: 1100    1200
0033F300: 1200    1300
0033F400: 1300    1400
0033F500: 1400    1500
0033F600: 1500    1600
0033F700: 1600    1700
0033F800: 1700    1800
0033F900: 1800    1900
0033FA00: 1900    1A00
0033FB00: 1A00    1B00
0033FC00: 1B00    1C00
0033FD00: 1C00    1D00
0033FE00: 1D00    1E00
0033FF00: 1E00    1F00

All these tests were performed with the latest hyperram.vhdl. It did not yet deactivate any more cache flags.

sy2002 commented 4 years ago

Update on my last comment: As written there, with the address range 0x33E000 to 0x340010 I am getting 31 errors on a freshly programmed FPGA. What I forget to mention is: If I then re-run the test program, I get 32 errors. Everything identical to my last comment but one more at the very first address:

0033E000: 1F00    0000

sy2002 commented 4 years ago

Tested the newest hyperram.vhd. The error pattern is identical to the one described in https://github.com/MEGA65/mega65-core/issues/280#issuecomment-699504233

lydon42 commented 3 months ago

@sy2002 @gardners graveyard shift here... shouldn't this be fixed by the new HyperRAM stuff? This is a 4 year old issue...

gardners commented 3 months ago

@lydon42 I recommend closing it as done, and if someone sees a problem with it in future, then we create a new issue based on what they encounter

MEGA65 / mega65-core