chipsalliance / dromajo

RISC-V RV64GC emulator designed for RTL co-simulation
Apache License 2.0
210 stars 63 forks source link

[Bug Report] Any mepc write should mask low bits #59

Closed Phantom1003 closed 2 years ago

Phantom1003 commented 2 years ago

The specification (Volume II: RISC-V Privileged Architectures V20211203 Page 37) said: mepc is an MXLEN-bit read/write register formatted as shown in Figure 3.21. The low bit of mepc (mepc[0]) is always zero. On implementations that support only IALIGN=32, the two low bits (mepc[1:0]) are always zero.

csr write function has this mask operation: https://github.com/chipsalliance/dromajo/blob/fae7b3a13ae5e28baceacf64f064af254501aa4a/src/riscv_cpu.cpp#L1284-L1287 while the raise exception function forgets: https://github.com/chipsalliance/dromajo/blob/fae7b3a13ae5e28baceacf64f064af254501aa4a/src/riscv_cpu.cpp#L1575-L1577

Phantom1003 commented 2 years ago

Testcase is from here: https://github.com/riscv-software-src/riscv-tests/blob/master/isa/rv64uc/rvc.S rvc.tar.gz

et-tommythorn commented 2 years ago

I'm afraid I don't follow. Do you mind showing me the trace where it doesn't behave as expected? As best I know, s->pc can never be set odd (it's an invariant, but isn't asserted for), thus line 1576 can't be setting lsb of s->mepc. As I missing something?

et-tommythorn commented 2 years ago

You are right that we aren't enforcing the MCPUID_C constraint, but honestly, I don't think Dromajo really supports running with compression disabled and we should probably hardcode that and prune the code supports changing it.

Phantom1003 commented 2 years ago

Yes, I agree with you. In rvc test, the test case will try to disable C extension and return to an address aligned to 2. In the latest specification, misa can be changed during runtime, I trigger this point by a simply patch to enable dromajo to modify misa. Currently dromajo does not support misa change, it's safe to ignore the mepc alignment most times. The only problem happens when exception is triggered at an address aligned to 1 (like error jalr value, or tampering the return address on the stack in case of a buffer overflow), the correct mepc value should ignore the last bit.

et-tommythorn commented 2 years ago

The only problem happens when exception is triggered at an address aligned to 1 (like error jalr value,

                                /* c.jalr */
                                val   = GET_PC() + 2;                                                                                                                                                                   
                                s->pc = read_reg(rd) & ~1;                                                                                                                                                              
                                write_reg(1, val);                                                                                                                                                                      

and

                val = GET_PC() + 4;                                                                                                                                                                                     
                {
                    intx_t new_pc = (intx_t)(read_reg(rs1) + imm) & ~1;                                                                                                                                                 
                    if (!(s->misa & MCPUID_C) && (new_pc & 3) != 0) {
                        s->pending_exception = CAUSE_MISALIGNED_FETCH;                                                                                                                                                  
                        s->pending_tval      = 0;                                                                                                                                                                       
                        goto exception;                                                                                                                                                                                 
                    }                                                                                                                                                                                                   
                }
                s->pc = (intx_t)(read_reg(rs1) + imm) & ~1;

I don't see a way to write an odd value to s->pc for jalr.

or tampering the return address on the stack

On a RISC processor the stack is a software construct. The only way to use it, is to load and use jalr to return. Which as shown cannot set an odd pc value.

in case of a buffer overflow), the correct mepc value should ignore the last bit.

That last bit is always zero. There's no way to make it one.

Phantom1003 commented 2 years ago

You are right. I thought incorrectly that it was software to ensure that the last digit was 0... Does the reset vector have this check too? If the init value is safety and disables misa change, then there is no problem.

et-tommythorn commented 2 years ago

First, it's RISC-V spec that LSB is hardwire to zero, so if Dromajo didn't enforce it, it would be a bug. However at a quick glance, it looks like we have covered all cases:

  rg -- '->.*pc ='
dromajo/src/riscv_cpu.cpp
258:                    s->pc = 0x0800;
1236:            s->sepc = val & (s->misa & MCPUID_C ? ~1 : ~3);
1237:            s->sepc = SEPC_TRUNCATE(s->sepc);
1285:            s->mepc = val & (s->misa & MCPUID_C ? ~1 : ~3);
1286:            s->mepc = MEPC_TRUNCATE(s->mepc);
1335:            s->dpc = val & (s->misa & MCPUID_C ? ~1 : ~3);
1613:    s->pc = s->sepc;
1626:    s->pc = s->mepc;
1633:    s->pc = s->dpc;
1834:void riscv_set_pc(RISCVCPUState *s, uint64_t val) { s->pc = val & (s->misa & MCPUID_C ? ~1 : ~3); }
2364:    } else if (s->pc == BOOT_BASE_ADDR && boot_ram) {

dromajo/include/dromajo_template.h
284:        s->pc = GET_PC();
482:                    s->pc = (intx_t)(GET_PC() + imm);
563:                    s->pc = (intx_t)(GET_PC() + imm);
571:                        s->pc = (intx_t)(GET_PC() + imm);
581:                        s->pc = (intx_t)(GET_PC() + imm);
669:                            s->pc = read_reg(rd) & ~1;
686:                                s->pc = read_reg(rd) & ~1;
762:                s->pc = (intx_t)(GET_PC() + imm);
778:                s->pc = (intx_t)(read_reg(rs1) + imm) & ~1;
803:                    s->pc = (intx_t)(GET_PC() + imm);
1124:                            s->pc = GET_PC() + 4;
1159:                            s->pc = GET_PC() + 4;
1191:                                s->pc = GET_PC();
1201:                                s->pc = GET_PC();
1214:                                    s->pc = GET_PC();
1255:                                    s->pc = GET_PC() + 4;
1641:    s->pc = GET_PC();

If you do find a way to violate this, then please let me know so we can fix it (note that in the "+ imm" without a "& ~1", the imm is even).

Phantom1003 commented 2 years ago

I mean the reset_vector from command line arguement:

RISCVCPUState *riscv_cpu_init(RISCVMachine *machine, int hartid) {
    RISCVCPUState *s   = (RISCVCPUState *)mallocz(sizeof *s);
    s->machine         = machine;
    s->mem_map         = machine->mem_map;
    s->pc              = machine->reset_vector;
et-tommythorn commented 2 years ago

Ha, good find, you win. My regexp missed that one. Yeah, it's technically a bug, but that was added by a 3rd party and I guess I never used it so I missed that. I don't consider this is big issue as SW running in Dromajo can't make that happen, but you can always configure a useless configuration, hard to catch all the issues.

I'll fix it eventually.

Phantom1003 commented 2 years ago

It was my pleasure, and I learned a lot of details that I usually overlook, thanks again!

et-tommythorn commented 2 years ago

The pleasure was mine. I think I failed to mention why I didn't just strip the bit as you first suggested. The issue is that evenness of pc is an invariant that should always hold. If we strip it in one place it would suggest there were a bug somewhere else. There are other such invariants and unfortunately they haven't been formalized properly. This is something that has come up before and in fact, the current "& ~1" etc were added exactly because random testing found them.

I think Dromajo is pretty decent wrt. to correctness of simple instructions. The areas I'm most worried about is the memory system as there is a lot of semantics hiding in a seemingly simple load and store instruction.