chipsalliance / VeeRwolf

FuseSoC-based SoC for VeeR EH1 and EL2
293 stars 67 forks source link

axi/wb bus cycle times #55

Open jamesbbecker opened 2 years ago

jamesbbecker commented 2 years ago

I have your design running in nexys A7 with an EL2 core.

When I do a sequence of writes to an I/O port, such as GPIO, and my code is running in ICCM, the writes are lost, except for the last one. I can solve the problem by inserting some delay in between the I/O writes.

It appears that with optimized code, the writes through Axi / Axi Mux / WB occur slower than the processor can write the data.

Is this the way it is supposed to work? Is there some sort of mechanism for the AXI writes to delay the processor core while each one is completing?

I guess I can write code to wait on I/O writes to complete before I send another one. Am I required to do that?

olofk commented 1 year ago

Ouch! That sounds really bad and should definitely not happen. Do you have any test program to share?

jamesbbecker commented 1 year ago

Hi Olof,

The code to make this happen is pretty simple. The hard part (from my experience) is getting the code inside of ICCM to run. I found the following was helpful:

I created a set of code which is loaded into boot rom which does nothing but jump to code in ICCM.

define ICCM_BASE_ADDRESS 0xxxxxxxxx < Put your ICCM address here.

typedef void (*function_ptr)(void); main() { function_ptr jump_to_ptr;

jump_to_ptr = ((function_ptr)ICCM_BASE_ADDRESS);

(*jump_to_ptr)(); }

This code then had to be compiled and embedded in the boot_rom of the verilog, so that when its loaded into the FPGA, it runs when reset is released.

For the code that is in ICCM, you need to just write to GPIO over and over with different values. In the example below, the 0xaa will never be written to GPIO, but 0x55 will be written over an over again.

iccm_main() { int counter = 0; do { ((volatile UINT32 )SYSCON_GPIO) = 0xaa; ((volatile UINT32 )SYSCON_GPIO) = 0x55;

   counter++;  // Introduce a delay.
   counter++;

} while (1); }

I compiled this code and loaded it into ICCM using a debugger.

jamesbbecker commented 1 year ago

Olof,

I'm not that familiar with the internals of the RISCV, but I did some simulation of the EL2 with my code to try to figure out why the instructions were being dropped. Attached are 2 screenshots from my simulator.

In the first, the instruction tries to do the write too quickly and the instruction is never executed. In the second, the instruction happens later, and succeeds. Screenshot-Failed-Write Screenshot-Success-Write

The signal obuf_wr_timer in the file el2_lsu_bus_buffer seems to be important. If it hasn't reached its maximum value of 7 prior to the write being executed, the write never occurs.

The actual instruction is the 3rd line from the top. The binary instruction 00e7a023 is the attempt to write to IO.

Hope this helps.

olofk commented 1 year ago

Thank you! This is definitely helpful. I have been busy with other things, but hope to get to this soon.

olofk commented 1 year ago

Ok, this took way longer than I had been hoping for. I do have a theory now at least. Or at least a question. Are you setting MRAC correctly before running your code? I have done some experimenting and can get similar (but not identical) results if I don't write to MRAC before running the code. If I put in fence operations or enough nops (at least eight nops seem to be required between the writes) I can get the writes to work again.

jamesbbecker commented 1 year ago

Hi Olof,

Thanks for looking into this for me.

Looking through these documents:

The RISC-V Instruction Set Manual Volume I: Unprivileged ISA Document Version 20191213

The RISC-V Instruction Set Manual Volume II: Privileged Architecture Document Version 20211203

I see no reference to an mrac register.

I do find a reference to an mrac regsister in the el2 source code "el2_dec_tlu_ctl.sv".

Can you give me a clue on what all the bits mean for the various regions, or point me to a human readable spec on what the register does?

Jim


From: Olof Kindgren @.> Sent: Friday, May 12, 2023 2:50 PM To: chipsalliance/Cores-SweRVolf @.> Cc: James Becker @.>; Author @.> Subject: Re: [chipsalliance/Cores-SweRVolf] axi/wb bus cycle times (Issue #55)

Ok, this took way longer than I had been hoping for. I do have a theory now at least. Or at least a question. Are you setting MRAC correctly before running your code? I have done some experimenting and can get similar (but not identical) results if I don't write to MRAC before running the code. If I put in fence operations or enough nops (at least six nops seem to be required between the writes) I can get the writes to work again.

— Reply to this email directly, view it on GitHubhttps://github.com/chipsalliance/Cores-SweRVolf/issues/55#issuecomment-1546213529, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AJMFQUESVALTBS42WB2QJRLXF2H7NANCNFSM5Z7W4FZQ. You are receiving this because you authored the thread.Message ID: @.***>

olofk commented 1 year ago

mrac is a SweRV (or VeeR, as it is now called) -specific register and is described in the EL2 manual https://github.com/chipsalliance/Cores-VeeR-EL2/blob/main/docs/RISC-V_VeeR_EL2_PRM.pdf

Basically, the 32-bit memory map is divided into 16 regions (MRAC == Memory Region Access Control). Two adjacent bits control for each region whether the region corresponds to cachable memory and if it is volatile. By default this register is set to 0x00000000 which means all regions are regarded as uncachable and non-volatile. The non-volatility means is the key here because it allows VeeR to optimize away subsequent writes to the same address. The SweRVolf (soon to be renamed VeeRwolf) bootloader begins by setting the upper half (0x80000000-0xFFFFFFFF) to uncachable, volatile memory by writing 0xAAAA0000 to mrac. https://github.com/chipsalliance/Cores-SweRVolf/blob/master/sw/boot_main.S#L37 (Technically, this should probably be 0xAAAA5555 to allow the RAM to be cachable, but at least this way we don't need to worry about stale caches.)

So, to conclude, please see if the problem persists after writing 0xAAAA0000 to mrac before running the code.