cvut / qtrvsim

RISC-V CPU simulator for education purposes
GNU General Public License v3.0
504 stars 69 forks source link

Memory access issue when xlen=64 #146

Open KotorinMinami opened 4 months ago

KotorinMinami commented 4 months ago

In the new simulator phase, we can set xlen to 64.

However, even after configuring it to 64-bit, the simulator's memory access range remains 32-bit, while the registers are 64-bit. This discrepancy will cause issues with instructions such as load during execution.

For more details, you may refer to example.S, where you will find that an infinite loop occurs due to the inability to read pre-set values from memory.(After lw , the x5 remain 0)

Screenshot from 2024-07-31 03-28-32

ppisa commented 4 months ago

You are right that there is problem with template.S when xlen = 64. But problem is not in the instructions execution but in assembly phase. If you use external assembler with correct directives and address definition, then it works. The cause of problem with internal assembler is that QtRvSim li pseudo-instruction same as la one generates only two instructions sequence suitable cover 32-bit register value rage. In 64-bit mode, the 32-bit value is sign-extended, but the actual UART location is mapped to 0x00000000ffffc000 address. That address is relict of QtMips where it fits into simple addi constant range or even lw offset range against zero register. The next trick woks for internal assembler

loop:
    li   a0, SERIAL_PORT_BASE           // load base address of serial port
    addi a1, zero, 32 
    sll  a0, a0, a1
    srl  a0, a0, a1
    addi a1, zero, text_1               // load address of text

slli and srli range is 0 to 31 for assembler even if 64-bit mode is set which should be corrected. When corrected, there would be problem if the code is compiled in 64-bit mode and then processor mode is switched to xlen = 32 without recompilation. Another option to obtain correct/zero extended address is to define

.data

uart_base: .word SERIAL_PORT_BASE

at end of the code and then load value by

loop:
    la   a0, uart_base           // load base address of serial port
    lwu  a0, 0(a0)

but that solution is not compatible with compilation in 32-bit mode, because lwu is invalid instruction there.

The sll and 'srl' trick works even without recompile, because bit 5 of the shift is ignored n 32-bit mode which means that shift is by 0 bits...

I have thought what is the best solution even in past but I have not come to clean/preferred solution yet.

Another option is to map UART to 0xffffffffffffc000 location in 64-bit mode. The memory model even allows multiple mapping of peripheral but it slows down memory access execution and there is already alias at 0xffff0000 for UART which keep code compatibility with QtSpim in QtMips time.

So some feedback what seems acceptable is welcomed for decision about future development steps.

Easy to improve is to add .dword assembler construct. I hope that expression evaluation should work reasonably well, because it has been implemented on 64-base type from the start by me. Then slli, srli etc. range switch should be considered.

But in general, I suggest to use 32-bits for basic teaching for now. 64-bits are there to allow to run simple binaries for standard RISC-V Linux kernel system where 64-bit mode prevails and even QtRvSim syscalls should work in 64-bit mode, has been designed and updated for compatibility with it. There is another problem that full 64-bit values do not fit and propagate into signal labels in the CPU diagram.

ppisa commented 2 months ago

The commit

09435a40a28789526137768d2742826a42a43b31 Machine: add peripherals high/top address aliases for XLEN=64

provides aliases for peripherals in 64-bit mode to allow template.S and other code to work properly in 64-bit mode where LUI leads to sign extended values and internal assembler does not allow to specify full 64-bit constant.

I do not like this mapping duplication but it should help users and keep code compiled bu full featured assembler where original addresses in top of 4 GiB address space are preserved.

I am not sure with the rest and it is not material for actual release for sure. In general, if the support for larger memory ranges emulation is clean (@jdupak review would be helpful) then I agree with its merging in the next development cycle. The variable depth would have some overhead, mainly if deeper tree and shorter one are used in parallel (more branch patters miss-predictions) but simulator is already slow by other components. On the other hand, I am not sure if I want to see extension of the address space to be enabled by default and I see disabling peripherals at 0xffxxxxxx addresses as unwelcome. Some config option could be possible.

There would be need to invest precise time into memory viewers sliding windows and ergonomic. 16 digits in goto line and in the address column are unreadable for people like me.