Closed sy2002 closed 3 months ago
So this looks to me like the same problem in both cases: Writing to the first word of a 256-word page is wrapping and ending up in the wrong place.
First step would be to put this write pattern into the HyperRAM 16-bit access test bench in VHDL, and see if it shows up under simulation.
USed the above to test, and while I could not reproduce the EXACT problem, I did see something perhaps similar enough:
src/vhdl/test_hyperram16.vhdl:1866:15:@245274ns:(report note): DISPATCHER: Writing to $8666C5C <- $02FB
src/vhdl/test_hyperram16.vhdl:1866:15:@245286ns:(report note): DISPATCHER: Writing to $8666C5E <- $02FC
src/vhdl/test_hyperram16.vhdl:1866:15:@245298ns:(report note): DISPATCHER: Writing to $8666C60 <- $02FD
src/vhdl/test_hyperram16.vhdl:1866:15:@245310ns:(report note): DISPATCHER: Writing to $8666C62 <- $02FE
src/vhdl/test_hyperram16.vhdl:1866:15:@245322ns:(report note): DISPATCHER: Writing to $8666C64 <- $02FF
src/vhdl/test_hyperram16.vhdl:1861:15:@245526ns:(report note): DISPATCHER: Reading from $8666666, expecting to see $0000
src/vhdl/test_hyperram16.vhdl:1818:13:@246138ns:(report note): DISPATCHER: Read correct value $0000 after 612ns.
src/vhdl/test_hyperram16.vhdl:1861:15:@246162ns:(report note): DISPATCHER: Reading from $8666668, expecting to see $0001
src/vhdl/test_hyperram16.vhdl:1821:13:@246354ns:(report note): DISPATCHER: ERROR: Expected $0001, but saw $FFFF after 192ns.
src/vhdl/test_hyperram16.vhdl:1861:15:@246462ns:(report note): DISPATCHER: Reading from $866666A, expecting to see $0002
src/vhdl/test_hyperram16.vhdl:1821:13:@246486ns:(report note): DISPATCHER: ERROR: Expected $0002, but saw $FFFF after 24ns.
src/vhdl/test_hyperram16.vhdl:1861:15:@246498ns:(report note): DISPATCHER: Reading from $866666C, expecting to see $0003
src/vhdl/test_hyperram16.vhdl:1821:13:@246522ns:(report note): DISPATCHER: ERROR: Expected $0003, but saw $FFFF after 24ns.
src/vhdl/test_hyperram16.vhdl:1861:15:@246534ns:(report note): DISPATCHER: Reading from $866666E, expecting to see $0004
src/vhdl/test_hyperram16.vhdl:1821:13:@246558ns:(report note): DISPATCHER: ERROR: Expected $0004, but saw $FFFF after 24ns.
src/vhdl/test_hyperram16.vhdl:1861:15:@246570ns:(report note): DISPATCHER: Reading from $8666670, expecting to see $0005
src/vhdl/test_hyperram16.vhdl:1818:13:@246786ns:(report note): DISPATCHER: Read correct value $0005 after 216ns.
src/vhdl/test_hyperram16.vhdl:1861:15:@246894ns:(report note): DISPATCHER: Reading from $8666672, expecting to see $0006
src/vhdl/test_hyperram16.vhdl:1818:13:@246918ns:(report note): DISPATCHER: Read correct value $0006 after 24ns.
So here we are seeing four words reading back junk, and it happens even if I do this write-read loop a 2nd time. The write doesn't seem to happen, or writes the wrong data.
Inserting a write to somewhere else just before this long batch of writes fixes the first word, but the following 3 words are still wrong.
Correction: The following 4 words are wrong, identically to without this. It's one whole 8-byte cache row that is wrong, it seems.
Ok, I think the problem is that when a read pre-fetch is aborted due to a conflicting read, that it doesn't set the correct address for the new read to the HyperRAM.
There is a separate performance issue that I could also look at, where under certain circumstances writes to successive words don't get merged into a single block write. But that will only result in slower writing, not bugs like you are seeing.
Thank you for analysis Paul. Sounds good!
Actually, the problem is that when we abort a read pre-fetch, we somehow end up sending the first byte of the command bytes to the hyperram twice, e.g., A0 A0 00 66 66 00 instead of A0 00 66 66 00 04. Hilarity then ensues...
We could disable the cancelling of read fetches. This would make linear reads a little faster on average, at the cost of an increase in maximum latency. But better to find and fix the root cause first...
Yes, root cause sounds much better. Take your time. I am on a vacation now with my family and not back before September 7th :-)
Problem found and believed fixed:
When aborting a prefetch read operation, the clock phase shift has to be set back to that for commands. This causes an extra clock edge to be produced when the controller setups the CA address data, causing the first byte of that to get latched twice.
Please try it now, and let me know the result.
Thank you Paul for this ultra fast fix! After being back at a desk with MEGA65 hardware (September, 7th) I will immediatelly test :-)
Unfortunatelly, this did not fix it:
I tested it and the results are exactly the same as the ones described above in https://github.com/MEGA65/mega65-core/issues/280#issue-683859002
Even the behaviour "newly configured FPGA" (step 1) vs. "running the test twice" (step 2).
Hmm.. It is quite strange that it is behaving exactly the same after the fix. You did grab the hyperram source files from the head of 138-hdmi-audio-27mhz, not a different branch, right?
Yes, I did. I even did a file compare and synthesized twice, because I also at first thought that I made a mistake.
You are right, it seems I mixed up the 138-hdmi-audio-27mhz branch with the 138-hdmi-audio branch (without the 27mhz at the end):
I have tested the wrong file.
Retesting now.
OK did double check now, with the 100% correct file.
Problem is still here as described in https://github.com/MEGA65/mega65-core/issues/280#issue-683859002
(I always had the right file, already in my initial test, ignore my last comment - sorry for the confusion)
Using the newest file
wget https://raw.githubusercontent.com/MEGA65/mega65-core/138-hdmi-audio-27mhz/src/vhdl/hyperram.vhdl
which does nothing else differently than setting this signal to zero:
signal block_read_enable : std_logic := '0'; -- disable 32 byte read block fetching
leads to the following result:
Freshly configured FPGA means: I am programming a "cold started" FPGA with a fresh bitstream and then load my test program:
The linear write/read test works like a charm! 👍
"Again and again" means: Re-running the test program again and again after the successful run 1. The 1MB write/read-back works mostly fine, but the very first two writes/reads are failing:
8-bit test: PASSED
16-bit linear test: Fill 1MB from 0x0333333 with 16-bit values, then test:
16-bit linear test: FAILED
Read Errors: 0002
Address Actual Expected
00333333: CC00 0000
00333334: CC01 0001
How small a fill can you do, where repeating it still stuffs up? I am keen to reduce this to the absolute minimum case.
Meanwhile, maybe try disabling all of the other cache things, and see if that works, and then we can reenable each in turn. Send me the repository link again together with the test procedure you are using, and maybe I can try doing that from here.
OK - I will try the disabling of the other cache things and prepare the testbed, so that you can use it for example to attach Xilinx ILA / Chipscope. I will try to contact you on Skype later.
@gardners about this question:
How small a fill can you do, where repeating it still stuffs up? I am keen to reduce this to the absolute minimum case.
I did the experiments and here is the result:
I can only reproduce it, when a wrap around happens on the high word:
So for summing up: My original test ran from 0x333333 to 0x433333, which equals 1 MB. My new test runs from 0x333333 to 0x340000 which equals 52429.
About your second question: Yes, you can play around with it by yourself. I created a new branch in the QNICE repository called dev-hram
. You can synthesize using hw/xilinx/MEGA65/Vivado
- but I assume I need to give you some mini introduction, before you might be able to attach ILA / Chipscope to it. I will contact you on Skype.
I found out some more interessting stuff:
As described above: When doing 1 MB from 0x333333 to 0x433333, then in the first run it works, in subsequent it does not and I do get two errors.
When going from 0x333333 to 0x340000, which is only 52429 words, then the behavior is identically to the 1 MB case: two errors, but only on subsequent runs. In the first run it works.
Now, fasten your seatbelt: But when I go from 0x33E000 to 0x340010, then it already fails at the first run and I do get 31 errors already on the first run. Here they are
16-bit linear test: FAILED
Read Errors: 001F
Address Actual Expected
0033E100: 0000 0100
0033E200: 0100 0200
0033E300: 0200 0300
0033E400: 0300 0400
0033E500: 0400 0500
0033E600: 0500 0600
0033E700: 0600 0700
0033E800: 0700 0800
0033E900: 0800 0900
0033EA00: 0900 0A00
0033EB00: 0A00 0B00
0033EC00: 0B00 0C00
0033ED00: 0C00 0D00
0033EE00: 0D00 0E00
0033EF00: 0E00 0F00
0033F000: 0F00 1000
0033F100: 1000 1100
0033F200: 1100 1200
0033F300: 1200 1300
0033F400: 1300 1400
0033F500: 1400 1500
0033F600: 1500 1600
0033F700: 1600 1700
0033F800: 1700 1800
0033F900: 1800 1900
0033FA00: 1900 1A00
0033FB00: 1A00 1B00
0033FC00: 1B00 1C00
0033FD00: 1C00 1D00
0033FE00: 1D00 1E00
0033FF00: 1E00 1F00
All these tests were performed with the latest hyperram.vhdl
. It did not yet deactivate any more cache flags.
Update on my last comment: As written there, with the address range 0x33E000 to 0x340010 I am getting 31 errors on a freshly programmed FPGA. What I forget to mention is: If I then re-run the test program, I get 32 errors. Everything identical to my last comment but one more at the very first address:
0033E000: 1F00 0000
Tested the newest hyperram.vhd
.
The error pattern is identical to the one described in https://github.com/MEGA65/mega65-core/issues/280#issuecomment-699504233
@sy2002 @gardners graveyard shift here... shouldn't this be fixed by the new HyperRAM stuff? This is a 4 year old issue...
@lydon42 I recommend closing it as done, and if someone sees a problem with it in future, then we create a new issue based on what they encounter
Start at address 0x00333333 and fill 1MB of data with 16-bit values starting from 0x0000 and then count them up by one at every address, so:
After filling 1MB like this, the code starts to read everything back. If something goes wrong, then the address where this happens, the actual value and the expected value is printed.
STEP 1: Run from a freshly programmed FPGA (means: HRAM driver and wrapper fresh)
The very first 255 words of the 16-bit part work. But then word number 256 and 257 have a problem. And from there on, every 256 words of the 16-bit part has errors:
STEP 2: Run the program again (without re-programming the FPGA)
Similar result, but there are two more errors (410 errors instead of 408) and those two additional errors are at the very beginning.