NationalSecurityAgency / ghidra

Ghidra is a software reverse engineering (SRE) framework
https://www.nsa.gov/ghidra
Apache License 2.0
50.32k stars 5.76k forks source link

libsla Feature Request: Include Program Counter Register in .sla XML #5888

Open pjsoberoi opened 10 months ago

pjsoberoi commented 10 months ago

I would like to request a change to the .sla XML schema. Currently the .sla XML does not specify which register is the program counter. When using the libsla emulator (see sleighexample.cc) this results in the program counter register never being set. Emulation works, but EIP is never set correctly. IMHO this is rather unexpected. memstate.getValue(register) works for all registers except for the program counter.

To get around this in my code I've implemented something like the following:

memstate.setValue("PC", value);
emulator.setExecuteAddress(Address(trans.getDefaultCodeSpace(), memstate.getValue("PC")));
...
...
emulator.executeInstruction();
...
...
pc = emulator.getExecuteAddress().getOffset();
memstate.setValue("PC", pc);

This way PC and the emulator are always in sync. The drawback is I need to know which register is the program counter. There is nothing in the .sla to tell me that information. The .pspec has the information:

  <programcounter register="PC"/>

but the .pspec is not an input to the libsla emulator.

I see two ways going forward: 1) update the .sla schema to include a line to specify which is register is the program counter 2) update the libsla emulator to taken in an optional .pspec file

I prefer the first option. Once implemented in the .sla, the libsla emulator can update the program counter register transparently to the user. This means memstate.getValue(register) works for all registers as expected.

Pinging @mumbel @GhidorahRex

pjsoberoi commented 10 months ago

Assuming my understanding of pc not being set, this issue gets worse. Any SLEIGH implementation for an instruction that references pc instead of instr_next is broken. See (superh.sinc)[https://github.com/NationalSecurityAgency/ghidra/blob/master/Ghidra/Processors/SuperH/data/languages/superh.sinc]:

:trapa  imm_00_07  is ocode_08_15=0b11000011 & imm_00_07
{
    r15 = r15 - 4;
    *r15 = sr;

    r15 = r15 - 4;
    *r15 = pc + 2;

    pc = *(vbr + (imm_00_07 * 4));
    call [pc];
}

Specifically *r15 = pc + 2. For any register except for pc this works as expected. Since pc is not set, who knows what it's writing there. The fix is probably something like

*r15 = inst_next

but I don't have time to test it. The real correct fix is to make sure the program counter register is correct before and after every instruction call.

emteere commented 9 months ago

We'll fix the PC issue in the SuperH and do a quick scan for other places the PC might be used incorrectly.

The PC should never be used in a sleigh instruction, unless you have set it in that instruction. In a few cases use of the pc in an instruction had been used to communicate with an analyzer where the PC was going for a dynamic jump, but that isn't needed anymore.

As you note, inst_start, inst_next, inst_next, instnext2 should be used to refer to addresses around the instruction. Some instructions are variable in length or have semantics about when the PC is set and what value it is set to. Many modern day processors have no way to get or set the value of the PC. And when they do, it needs to be done carefully so using inst* symbols is better because you can see exactly what is going on with the PC. There are processors that can set the PC, and then don't act on it until some future point.

If you were to get the current value of the PC and use it, there would need to be special code in the emulator that can keep track of all the variants of what the PC is at any given point during instruction execution for a particular processor. Knowing which register is the PC wouldn't really solve the issue.

When an instruction does need to know the value of the PC, for example loading it into a register, it should still be done with inst_*.

Possibly knowing the PC register and making sure no pcode is using it before being set within a single instruction would be useful to find errors such as the one you note above.