enjoy-digital / litex

Build your hardware, easily!
Other
2.87k stars 554 forks source link

litex_sim does not reach the end of simulation with cv32e41p #1364

Open HugoDesneux opened 2 years ago

HugoDesneux commented 2 years ago

Hello @enjoy-digital,

I want to simulate the bare metal demo application on RISC-V cv32e41p and I had this bug of no prompt response. However, it works fine with the cv32e40p core.

(The litex bios prompt works well for me.)

I executed the following command :

litex_sim --cpu-type=cv32e41p --with-sdram --sdram-init=/home/hugo/Documents/litex/litex/soc/software/demo/demo.bin

Then it reaches this part :

[xgmii_ethernet] loaded (0x559f082daef0)
[ethernet] loaded (0x559f082daef0)
[serial2console] loaded (0x559f082daef0)
[serial2tcp] loaded (0x559f082daef0)
[gmii_ethernet] loaded (0x559f082daef0)
[clocker] loaded
[spdeeprom] loaded (addr = 0x0)
[clocker] sys_clk: freq_hz=1000000, phase_deg=0

%

Then nothing happens

Any idea to fix this problem ? Thank you

pcotret commented 1 year ago

Spent a few minutes, got the same behavior:

litex_sim --cpu-type=cv32e41p

After a few seconds, got stuck with:

make[1]: Leaving directory '/home/pascal/test/cv32e41p/build/sim/gateware/obj_dir'
make: Leaving directory '/home/pascal/test/cv32e41p/build/sim/gateware'

[ethernet] loaded (0x5579bd757ef0)
[serial2tcp] loaded (0x5579bd757ef0)
[clocker] loaded
[spdeeprom] loaded (addr = 0x0)
[gmii_ethernet] loaded (0x5579bd757ef0)
[serial2console] loaded (0x5579bd757ef0)
[xgmii_ethernet] loaded (0x5579bd757ef0)
[clocker] sys_clk: freq_hz=1000000, phase_deg=0

I'll have a look at it a bit later.

weik56100 commented 1 year ago

I have the same problem on my side when i try the cpu type cv32e41p. (Litex 2022.08)

panantoni01 commented 1 year ago

Hello, it seems that I've managed to find a reason for this failure - I generated VCD traces for the litex simulation, analyzed them in gtkwave and here is what I found.

This is a part of disassembled bios.bin:

0000254c <litex_putc>:
   ...
   ...
    2562:   37ed                    jal 254c <litex_putc>
    2564:   40b2                    lw  ra,12(sp)
    2566:   8522                    mv  a0,s0
    2568:   4422                    lw  s0,8(sp)
    256a:   0141                    addi    sp,sp,16
    256c:   8082                    ret

During execution of this part of code a strange bug happens - at some point the CPU is executing jal instruction, but for some reason it saves 2566 as a return address instead of 2564. This results in omitting lw ra,12(sp) instruction after we return from the function call. This lw instruction is responsible for restoring the return address of the caller function, but since it is not executed, the 2566 value is still left in ra register. Therefore, executing ret later on results in jumping to 2566 and we end up being stuck in some kind of "loop" (instructions 2566, 2568, 256a, 256c are being executed repeatadly).

The instructions here are 2-byte sized (compressed), but while executing the jal instruction, the CPU saves pc+4 instead of pc+2 as a return address, just as if the executed instruction was 4-byte sized (not compressed). For this reason I decided to rerun the simulation, but this time with a binary, that doesn't contain compressed instructions (remove c from march in https://github.com/enjoy-digital/litex/blob/master/litex/soc/cores/cpu/cv32e41p/core.py#L32):

    "standard": "-march=rv32i2p0_m    -mabi=ilp32 ",

After this change everything worked fine - I got the bios prompt.

pcotret commented 12 months ago

(reading some issues related to OpenHW group cores)

@panantoni01 Hum, strange... Does it mean there would be a bug with compressed instructions on the 41P? I haven't seen issues related to it in the openhwgroup repository... :neutral_face: