NJU-ProjectN / nemu

NJU EMUlator, a full system x86/mips32/riscv32/riscv64 emulator for teaching
Other
858 stars 184 forks source link

[FIX] spike-diff/difftest.cc: flush the tlb before copy memory towards ref #80

Closed rijuyuezhu closed 10 months ago

rijuyuezhu commented 10 months ago

Without flushing the tlb before copy memory towards ref, the copying may result in wrong results on ref.

Why I found this problem: The reason is interesting and can serve as an example of how things go wrong. The issue I met is that when implementing difftest detach & attach in PA3, I copied a series of instructions to RESET_VECTOR like csrw ... to set the CSR on ref properly. I copied momory two times: one for csrw instrucctions, and the other for the whole dut memory. However, my difftest results tell me that the second copy didn't work as expected. And I found that there is an icache in spike, which is not flushed properly in the interface functions provided, thus the latter copy didn't work.

sashimi-yzh commented 10 months ago

Although your patch is a solution to the issue, I suggest to let Spike execute a fence.i instruction right after the second memory copy. The semantics of diff_memcpy() only cares about behavior in the memory level. Synchronization between memory and the instruction stream should be considered by CPU, which is exactly the aim of fence.i.

Alternatively, you may put the CSR instruction sequence somewhere else to execute.

It is the programmer's responsibility to guarantee that two instruction sequences do not overlap. If they do, use a fence.i before executing the second.

rijuyuezhu commented 10 months ago

Although your patch is a solution to the issue, I suggest to let Spike execute a fence.i instruction right after the second memory copy. The semantics of diff_memcpy() only cares about behavior in the memory level. Synchronization between memory and the instruction stream should be considered by CPU, which is exactly the aim of fence.i.

Alternatively, you may put the CSR instruction sequence somewhere else to execute.

It is the programmer's responsibility to guarantee that two instruction sequences do not overlap. If they do, use a fence.i before executing the second.

Thanks for your reply! In my opinion, maybe using a fence.i instruction to synchronize from the software side is a more elegant way, WHEN in riscv programs in which the instructions are modified by themselves. However, I think(and I assume) the diff_memcpy serves as a method to modify the inside program(the ref) from the outside(the caller), and the modification is not predictable by the inside. Thus in my implementation the duty of running fence.i shall be done by the outside, who takes the responsibility of modifying the inside.

To make the API diff_memcpy useful, it shall execute fence.i automatically after its call, no matter how it does, maybe using a true instruction fence.i, or simply running mmu->flush_tlb();. If it does not do that, nothing is guarenteed and its functionality is not complete. Thus I do not agree that the semantics of diff_memcpy do not include flushing the icache and pipeline.

Anyway, various methods can be applied based on concrete implementations and the understanding of the project code. And I mark the PR as closed.

sashimi-yzh commented 10 months ago

Below are some further discussions..

rijuyuezhu commented 10 months ago

Thank you @sashimi-yzh, that makes sense. Still, I have some questions about that.

sashimi-yzh commented 10 months ago

For your information, the RISC-V processor project RocketChip implements a debug module. Inside the debug module, there is a program buffer which is accessed by MMIO. Instructions fetched from the program buffer will not enter icache.

Outside the real chip, we can use the on chip debugger, such as OpenOCD, to communicate with the debug module. This is achieved by eventually sending some signals to the JTAG pins of the chip. These signals will finally activate the debug module inside the chip. For more information, you may refer to the RISC-V Debug Specification.

Note that this is far away from the original issue you have encountered. I suggest putting the CSR instruction sequence somewhere else to execute. For example, the end of the memory may be a good choice, since it will hardly execute instructions at the end of the memory.