step23 - Incomplete program read from flash memory

fm4dd commented 8 months ago

In this step we need to use a different Flash offset address than the original 128K that was used by BrunoLevy in his tutorial. The Gatemate toolchain creates a bigger FPGA bitstream file that exceeds the 128K file size:

fm@nuc7fpga:~/fpga/projects/git/gatemate-riscv/step23$ ls -l SOC_00.cfg.bit
-rwxrwxrwx 1 root root 316128 Oct 15 15:48 SOC_00.cfg.bit

The 300K-sized bitstream data would easily overwrite the RISC-V program if it were uploaded at 128K. To solve this, I modified the "ldscripts-shared/spiflash0.ld" linker script file from 0x820000 (128K) offset to 0x880000 (512K) offset, and also changed the data read address value inside SOC.v.

   initial begin
      LI(a0,32'h00880000); // jump to SPI FLASH at 512 kB offset
      JALR(zero,a0,0);
   end

Finally inside the Makefile, I upload the RISC-V binary file with the 512K offset: openFPGALoader -b gatemate_evb_spi -o 524288 src-hello/hello.spiflash0.bin -f

Issue: The RISC-V program code is not completely read from flash memory, and therefore does not run.

In simulation (not using flash but reading the program from file), it works fine. This is a good sign that the compiled assembly program works as intended:

fm@nuc7fpga:~/fpga/projects/git/gatemate-riscv/step23$ make test
Running testbench simulation
test ! -e SOC.tb || rm SOC.tb
test ! -e SOC.vcd || rm SOC.vcd
/usr/bin/iverilog -DBENCH -o SOC.tb -s SOC_tb SOC_tb.v SOC.v ../rtl-shared/clockworks.v ../rtl-shared/pll_gatemate.v ../rtl-shared/emmitter_uart.v ../rtl-shared/spi_flash.v
/usr/bin/vvp SOC.tb
Gatemate E1 RISC-V: Hello World, running from Flash!

When I check the real FPGA execution with the protocol analyzer, I can see the SPI flash read for the RISC-V program starts at 0x80000, as intended. It reads the correct values (1st four bytes of the RISC-V program are "b7 01 40 00") and continues with the next 4 bytes. However, the SPI reading always stops after doing 15 times 4-byte read cycles, at address 0x80090. The last byte sequence received is "63 0a 05 00". This loads only 60 bytes of the 306-byte hello.spiflash0.bin program, and not even in the correct address sequence.

20231015-step23-incomplete-program-load Screenshot of the last 9x 4-byte SPI-reads.

After the SPI data load stops mid-program, the SPI read starts again at the beginning 0x80000, and the 60-byte read repeats in a permanent loop.

I think something is wrong with the SPI flash read logic.

When I reduce the size of the hello.spiflash0.bin (removing the "wait" function, shorten the output string), I even get it almost to work. A single character starts flooding the UART by chance.

g3grau commented 8 months ago

Hi again verilator revealed some more issues (although running it doesn't help yet, the compilation helps):

in SOC.v are still some duplicate assignments to the LED near line 444 // assign LEDS[4:0] = leds; // assign LEDS[7:5] = 3'b000;
The memory address range should cover 8MB (64Mbit, right?) or .word_address(mem_wordaddr[21:0]) ? Relates to SOC.h and spi_flash.v (but doesn't change anything here)
the address in the linker script and initial jump 0x0880000 looks ok, 0x800000 is the FLASH_BASE for mem-mapped IO (would help to define that somewhere) and 80000 should be well within the memory
as you pointed out, SPI transactions somehow look ok. I just remember the Endian scrambling in spi_flash.v assign rdata = {rcv_data[7:0],rcv_data[15:8],rcv_data[23:16],rcv_data[31:24]}; I didn't check it yet, but I assume this was only useful for the PC demo data in step22?

fm4dd commented 8 months ago

Thank you again for the checks and the suggestions! I worry if I was prematurely optimistic about the SPI flash issue. Yes I do not understand the "LENGTH" parameter in file spiflash0.ld. The original file from Bruno has LENGTH = 0x100000 /* 4M in flash */. This did not make sense to me, as 0x100000=1MB and not 4. That is, unless the "LENGTH" calculates with 4 bytes. In that case, I better set FLASH (RX) : ORIGIN = 0x00880000, LENGTH = 0x200000 /* 8M in flash */ for the Macronix. Although it should not make a difference, so far our code only uses the lowest 1-2MB area.

I also worry if a memory range gets too big for a defined bitrange. This let me avoid using the 1M offset we used in step22. The address configurations inside the Verilog code are hard to read for me.

fm4dd / gatemate-riscv

step23 - Incomplete program read from flash memory #4