NationalSecurityAgency / ghidra

Ghidra is a software reverse engineering (SRE) framework
https://www.nsa.gov/ghidra
Apache License 2.0
49.06k stars 5.65k forks source link

Emulator: use injected pcode for CALLOTHER #6669

Open shuffle2 opened 4 days ago

shuffle2 commented 4 days ago

Is your feature request related to a problem? Please describe. It's related to trying to make ghidra work with the Andes "EX9.IT" instruction: https://github.com/NationalSecurityAgency/ghidra/discussions/6612

Describe the solution you'd like I noticed that the emulator essentially just interprets pcode. However, it does so in such a way that a single top-level instruction's pcode is parsed and executed at a time (e.g. relies on relative jumps within an instruction working). This causes problems for CALLOTHER semantics, which can be expanded into injected pcode. However, conceptually it seems like the emulator should "just work" with existing PcodeInjectLibrary code. Can some adapter be made for that?

Or, perhaps the inverse should be done: use pcode defined for the userops in the emulator to fill pcode on the paths currently using pcode injection for sleigh userops. In either case, it seems like this pcode modelling could be uniform.

nsadeveloper789 commented 2 days ago

I thought about this a bit when designing the thing, but there's a disconnect between p-code injection for static analysis and for dynamic analysis. The injects in the pspecs, cspecs, etc., are often meant to simplify the static analysis and allow cleaner decompilation, e.g., overriding alloca_probe or the stack cookie checker. There are some cases, e.g., segment, where the existing inject library may make some sense in the dynamic case as well. That being said, there's only one flag on Instruction.getPcodeOps(boolean) to determine whether injections are taken or not, so I can't really pick and choose. Instead, I opted to never take injects and use a different mechanism for handling CALLOTHERs in the emulator.

I instead specified PcodeUseropLibrary, which at its core is simply a callback into Java code when the emulator encounters a CALLOTHER. It has some sugar if you'd rather model a userop using Sleigh/p-code. Instead of injecting (inlining), the emulator effectively treats the userop's p-code as a subroutine. That said, you might be able to create a PcodeUseropLibrary that adapts an existing PcodeInjectLibrary. Or at the very least, if a desired inject library is relatively small, you could probably create the equivalent userop library relatively easily. (See module B4 in the Debugger Tutorial.)

Going back to the static vs dynamic use case, we're already coming across situations where we'd like to have different instruction semantics depending on the use case. Consider a vector op. A human might just like to see something like vectoradd in the decompiler, which might just be an opaque userop. However, the emulator would need the full precise p-code. If the slaspec favors the emulator, the decompiler is going to render a rather ugly loop. We're thinking about ways to resolve this, but until we have that figured out, we're not likely to also conflate the userop definition mechanism for the two use cases. So for the moment, for better or worse, they remain distinct.

nsadeveloper789 commented 2 days ago

So, I also just took a look at the referenced ticket, and yeah, that's a fun one. My suggestion would be to make an equivalent PcodeUseropLibrary for the emulator. You'd probably want to take a look at DefaultPcodeThread.PcodeEmulatiionLibrary as an example. Your case will obviously be a little more complex, and I'm not sure the fields you need are accessible where you need them, but this should give you a gist of how you might accomplish it. (You might just start by adding this code to PcodeEmulationLibrary, and then work out how to factor it independently.)

@PcodeUserop
public void ex9it(int imm9u) {
    PcodeFrame saved = thread.frame;
    // Seems like everything you need is in imm9u, but in case you need the original instruction:
    Instruction curInstr = thread.instruction;
    // I'll leave you to fill in this computation
    Address fetchAddr = ...;
    Instruction fetchInstr = thread.decoder.decodeInstruction(fetchAddr, thread.getContext());
    // Do your validation. Throw a Java exception if the hardware would except. I'd recommend creating your own exception class extending PcodeExecutionException.
    thread.executor.execute(PcodeProgram.fromInstruction(fetchInstr), thread.getUseropLibrary());
    thread.frame = saved;
}
shuffle2 commented 1 day ago

Thanks, that's a good start for sure. First, I tried this:

@PcodeUserop
public void ex9it(T imm9u) {
    PcodeFrame saved = thread.frame;

    // Get current ITB value
    long itb = thread.arithmetic.toLong(
      thread.getState().getVar(thread.language.getRegister("ITB"), Reason.EXECUTE_DECODE),
      Purpose.DECODE);

    // Compute address to fetch from
    long memOffset = (itb & ~0b11) + thread.arithmetic.toLong(imm9u, Purpose.DECODE) * 4;
    Address fetchAddr = thread.language.getAddressFactory().getAddress(
      thread.language.getAddressFactory().getDefaultAddressSpace().getSpaceID(), memOffset);

    // TODO throw if fetch/decode fails
    Instruction fetchInstr = thread.decoder.decodeInstruction(fetchAddr, thread.getContext());
    // pc-relative branch instructions in Instruction_Table always branch to same target, no matter where EX9.IT or ITB is.
    // other instructions which have pc-relative references are not affected.
    FlowType flowType = fetchInstr.getFlowType();
    if (flowType.isJump() || flowType.isCall()) {
      // XXX this doesn't work as intended:
      // * the instruction is still decoded as if it exists at fetchAddr
      // * registers set based on pc-relative value (e.g. Link Pointer when executing JAL out of the table)
      //   get set to fetchAddr+4 instead of thread.instruction.getAddress()+4
      // The first point above means that non-branch insns with pc-relative reference are also decoded incorrectly
      // (they'll be relative to fetchAddr instead of thread.instruction)
      thread.executor.executeSleigh("PC = PC & 0xfe000000;");
      fetchInstr = thread.decoder.decodeInstruction(fetchAddr, thread.getContext());
    }

    // Do your validation. Throw a Java exception if the hardware would except.
    // I'd recommend creating your own exception class extending PcodeExecutionException.
    if (fetchInstr.getMnemonicString().equals("EX9.IT")) {
      // TODO throw Reserved Instruction Exception
    }
    // TODO currently, the language does not implement any exceptions (e.g. alignment)
    // besides explicit ones like syscall/trap.
    thread.executor.execute(PcodeProgram.fromInstruction(fetchInstr), thread.getUseropLibrary());
    thread.frame = saved;
}

(see the XXX for what's broken)

I also gave this a try, but it fails as thread.instruction has no program assigned:

try {
thread.executor.execute(
  PcodeProgram.fromInject(thread.instruction.getProgram(), "ex9it", InjectPayload.CALLOTHERFIXUP_TYPE),
  thread.getUseropLibrary());
} catch (Exception ex) {
  throw new PcodeExecutionException(ex.getMessage());
}

I assume I should make some hacked up version of SleighInstructionDecoder.decodeInstruction to fix the above issues?

btw, is there a faster way to iterate testing changes to core ghidra? Currently I do gradle assembleAll -x ip -x createJavadocs && %GHIDRA_INSTALL_DIR%\ghidraRun.bat but it's pretty slow/processing a lot of stuff that isn't necessary.

shuffle2 commented 1 day ago

Now I'm wondering if it may be nicer to use a contextreg to select how pc-relative addresses are computed for branches in the slaspec. In the PcodeUserop implementation, I could set the contextreg and then decode the instruction from the table. However, I tried to do something similar to that already for the pcode injection side, and couldn't get it to work as expected.

shuffle2 commented 1 day ago

yea...I'd think adding this would work, but it doesn't:

thread.overrideContext(new RegisterValue(thread.contextreg, BigInteger.valueOf(1)));
fetchInstr = thread.decoder.decodeInstruction(fetchAddr, thread.getContext());

and modifying sleigh like:

define context contextreg
  itMode=(0,0)
;
imm24s_rel: rel is s0_23 & itMode=0 [ rel = inst_start + (s0_23 << 1); ] { export *:4 rel; }
imm24s_rel: rel is s0_23 & itMode=1 [ rel = (PC & 0xfe000000) + (s0_23 << 1); ] { export *:4 rel; }
pc_next: is itMode=0 { pcrel = PC + 4; export pcrel; }
pc_next: is itMode=1 { pcrel = PC + 2; export pcrel; }
:JAL imm24s_rel is u24_24=1 & imm24s_rel & pc_next {
    set_link_gpr(lp, pc_next);
    psw_ifcon_clear();
    call imm24s_rel;
}

The result is that lp is set to PC +4 (at least it's not inst_next anymore), and the jump target is still the wrong address. So the decode isn't respecting the contextreg override.

edit: oh, I take that back, it does work - it's just that the contextreg endian is inverted from what I expected, or something. Setting it to 0xffffffff instead of 1 did trigger itMode=1 patterns to be matched.

There's still some weirdness: the emulator winds up with PC 4 past the jump destination when executing a JAL via EX9.IT (the decompiler shows the correct target location, though). Probably something to do with how the emulator increments PC after executing an instruction? In any case, very close to it working now :)

edit2: from looking at the pcode stepper, it looks like the extra 4 byte PC advance is from the "fall-through" which is executed after the CALLOTHER completes. I wonder if there's a way to override that, or will I need to kludge a PC -= 4 into the emulator to compensate? I also wonder if other parts of ghidra are having the same issue with this CALLOTHER when tracing flow through injected pcode.

shuffle2 commented 23 hours ago

Unfortunately, the above doesn't work in the disassembler/decompiler, because PC register is always value 0.

I've tried quite a few things to work around that deficiency while maintaining emulator functionality, and it seems infeasible (in sleigh at least).

Back to the drawing board.

nsadeveloper789 commented 22 hours ago

So, you don't need to override the emulator's context. In fact, you probably want to leave it in 16-bit mode, so it can continue in that mode once it has executed the 32-bit instruction. Instead just pass the custom context directly to the decoder:

Register itMode = thread.getLanguage().getRegister("itMode");
RegisterValue defaultCtx =
thread.defaultContext.getDefaultValue(thread.contextreg, fetchAddr);
// I'm assumine itMode=1 implies 32-bit instructions?
RegisterValue ctxMode32 = defaultCtx.assign(itMode, BigInteger.ONE);
Instruction fetchInstr = thread.decoder.decodeInstruction(fetchAddr, ctxMode32);
nsadeveloper789 commented 22 hours ago

As for the wrapping the inject option, missing the program could be a hard stop. The only way I can think of getting one of those in there is to factor your op out into its own library, and then pass the program into its constructor. Probably not worth going down that avenue, yet.

nsadeveloper789 commented 22 hours ago

As for PC being off, I haven't examined carefully, but since this involves execution of a second decoded instruction by reference, some of our usual conventions get broken (this is not something we've had to deal with before.) inst_next is effectively hardcoded into an instruction's p-code, and it's based on the address of that instruction. So, if you use inst_next in the JAL, it's going to refer to the instruction following the JAL, not the instruction after the EX9.IT that refers to it. That's unfortunate, because that's the convention we use everywhere we want PC-relative anything. There may be a way to work around this by reconstructing the fetched instruction as if it were at the EX9.IT's address:

Instruction reloced = new PseudoInstruction(thread.counter, fetchInstr.getPrototype(), fetchInstr, fetchInstr);

Then used reloced instead of the fetchInstr for the executor. No guarantees that's sane, but with some tweaking, it should work, and allow you to use inst_next in the conventional way.

Also, I forgot you asked earlier:

btw, is there a faster way to iterate testing changes to core ghidra?

We recommend using JUnit from Eclipse. You'd probably want to add yours to BytesTracePcodeEmulatorTest, at least to start.

shuffle2 commented 17 hours ago

So, you don't need to override the emulator's context. In fact, you probably want to leave it in 16-bit mode, so it can continue in that mode once it has executed the 32-bit instruction. Instead just pass the custom context directly to the decoder:

Register itMode = thread.getLanguage().getRegister("itMode");
RegisterValue defaultCtx =
thread.defaultContext.getDefaultValue(thread.contextreg, fetchAddr);
// I'm assumine itMode=1 implies 32-bit instructions?
RegisterValue ctxMode32 = defaultCtx.assign(itMode, BigInteger.ONE);
Instruction fetchInstr = thread.decoder.decodeInstruction(fetchAddr, ctxMode32);

itMode means "currently decoding the instruction referenced by an EX9.IT instruction", a condition which currently can only happen when the emulator is driving execution. In the above code, getDefaultValue returned null, so I replaced with

Register itMode = thread.getLanguage().getRegister("itMode");
RegisterValue itCtxMode = thread.context.assign(itMode, BigInteger.ONE);
Instruction fetchInstr = thread.decoder.decodeInstruction(fetchAddr, itCtxMode);

I think this still has the intended effect of not permanently overriding the context. I suppose the default value would be populated if I set it in <tracked_set>? Does seem odd that contextreg isn't defaulted to zero (which seems to be the expectation for sleigh code, which cannot explicitly initialize it).

Instruction reloced = new PseudoInstruction(thread.counter, fetchInstr.getPrototype(), fetchInstr, fetchInstr);

This works great, thanks a lot!

I'll have to see if these ideas can be applied to the pcode injection side, too.

Here's how I'm working around the PC advancement for now. Kludgy but at least it's working :)

PcodeFrame frame = thread.executor.execute(PcodeProgram.fromInstruction(reloced), thread.getUseropLibrary());
// compensate for the emulator advancing pc
// for whatever reason, the fallthrough adds 2, and external branch adds 4
thread.writeCounter(thread.counter.subtract(frame.isFallThrough() ? 2 : 4));
shuffle2 commented 15 hours ago

hum, I was thinking this could just be thrown into something extending EmulateInstructionStateModifier (which is found via emulateInstructionStateModifierClass in pspec), but when I went back to look - Emulate seems to be an entirely duplicated emulator unrelated to PcodeExecutor?