Pcode patching - Githubissues

This is required for more flexible IR arrangements.

The background is that, currently the only way to modify semantic of the program is through instruction patching. However, the instruction patching has some drawbacks:

instruction patching cannot insert any instruction
instruction patching cannot modify patch a longer instruction and keep the next instruction untouched

And, to be honest, those drawbacks are preventing strong analysis such as deobfuscating control flow flattening.

Obfuscations like control flow flattening would rearrange the basic blocks. But because of the drawbacks mentioned, no possible rearrangements can be done in Ghidra (or IDA). At least, not easily possible.

The solution of this problem is to allow pcode patching. That is, we allow user to display the raw-pcode and patch them.

What we need:

[ ] an action that pops only when clicked on raw-pcode (this is possible by checking which "row" the user clicked on the instruction.)
[ ] parsing the user input Pcode as the reverse version of the PcodeFormatter.
[ ] record the pcode
[ ] use the recorded pcode and bypass the decompiler calling sleigh engine

The reason of the last two is that the pcode is not stored in the database and is lifted each time by the sleigh engine as mentioned in this issue.

So maybe we could find out some way to bypass the translation and remember the last time lifted and use it for the pcode patching feature. Note that not all the functions need the pcode stored, only the ones patched. Or else we might have a database exploded in disk space.

Here's what I got.

We already have some of the implementation for:

transferring pcode across CPP and Java (through getPcodePacked())
pcode injections

Pcode injections could happen in two cases (that we shoud care):

callfixup injection
callotherfixup injection

By inspecting the implementation of these two injections, we might get an idea of how the pcode patching should be implemented.

Call fixup injection is for "modeling“ the called function with pcodes. So, callfixup injection can only happen at "call" or "callind" instruction.

Callotherfixup is for implementing custom userops. And it can be distinguished by pcode op CPUI_CALLOTHER.

cpp

CPP part of the implementation is:

FlowInfo is the class of translating instructions (and pcodes). Within FlowInfo, the injectlist contains all the pcode ops that need injection. (check flow.cc/hh)
in FlowInfo::checkForFlowModification , the isInline() is checked to see if any callfixup should be done (to inline the subfuction). If so, record that in the injectlist
for CallOtherFixup, in FlowInfo::xrefControlFlow it analyzes the control flow by inspecting the opcode, and when CPUI_CALLOTHER is encountered (flagging a call other fixup situation), it is recorded to the injectlist
after those inject points idenfied, the injection happens in FlowInfo::injectPcode. It goes through all injectlist items and does the real injection.
real injection is done by FlowInfo::doInjection() when payload is found.
finding the payload requires the payload be recorded pre-ahead (in userop case) or in glb->pcodeinjectlib which is in the PcodeInjectLibrary

So, if we are gonna implement the pcode patching, what we should do at the cpp side is:

add a new type of injection: "PcodePatch" (follow what is done by callotherfixup or callfixup)
get all patched pcodes from Java side and remember those in PcodeInjectLibrary
add the injection ops to the injectlist. Those ops should be found by using Java-side given addr or something alike.
add one more case to injectPcode, i.e, the else case other than CPUI_CALLOTHER , CPUI_CALL and CPUI_CALLIND. Find the payload and inject. A proper imitation of the FlowInfo::injectSubFunction is desired.

To get the injections from the Java side, some modifications to PcodeInjectLibraryGhidra is required (in inject_ghidra.hh/cc. Here we get the inject library from java side. Previously we only have callfixups and callotherfixups. Now we need one more.

Java

The java side interact with cpp side with DecompileProcess. in readResponse, here we could see the back-call (cpp to java call). And clearly could see getCallFixup, getCallotherFixup and getCallMech responses.

This means we also need a new protocol semantic for implementing our patching. Something like getPcodePatchFixup where cpp and java should understand simultaneously. This is needed to modify both java part DecompileProcess and cpp part.

The possible implementation of the negotiation procedure could be:

in ArchitectureGhidra::getPcodeInject (cpp) add one more inject payload type.
in DecompileProcess (java) add one more protocol parsing

Now we should deal with how the getPcodeInject(PCODEPATCHFIXUP) should be implemented (in Java).

Summing up a little, what we need:

add the payload to the database
when getting it in getPcodeInject, we should be able to get it out.

In the newest version of the database (24, updated in Mar. 2021) , it turns out the payload is already possible to reside in the database. (ProgramDB.java)

And the compiler spec has the PcodeInjectLibrary in it which we could take advantage of. So we just need to modify PcodeInjectLibrary to contain our type of fixup.

And, we should allow dynamically add the injection payload to our type of fixups reocrded in PcodeInjectLibrary (I mean it in java, same above).

Till now, the most part should work. The rest of the job is to:

parse the user submitted pcode op and turn it into the InjectPayload
add the action

They should be simpler and should not cause much of a problem.

StarCrossPortal / sleighcraft

Pcode patching #18

cpp

Java