NationalSecurityAgency / ghidra

Ghidra is a software reverse engineering (SRE) framework
https://www.nsa.gov/ghidra
Apache License 2.0
51.23k stars 5.84k forks source link

JMP instruction used with pre PUSH of return address only half recognised as CALL #5747

Open Wall-AF opened 1 year ago

Wall-AF commented 1 year ago

Describe the bug When a function call is made using a JMP instruction together with a previous PUSH of the return address, Ghidra realises this to be equivalent to a function call and is interpreted as such. However, the return address is ignored and the following statement is to exit the calling function with a return instead of moving to the next statement address specified by the PUSH of the return address as in the below x86 16-bit assembly extrac:t:

   1158:1b9b 26 81 7c     010           CMP        word ptr ES:[SI + 0x2],0x7000
             02 00 70
   1158:1ba1 74 14        010           JZ         LAB_1158_1bb7
   1158:1ba3 26 81 7c     010           CMP        word ptr ES:[SI + -0x14],0x7000
             ec 00 70
   1158:1ba9 74 06        010           JZ         LAB_1158_1bb1
   1158:1bab 68 c2 1b     010           PUSH       0x1bc2
   1158:1bae e9 d1 f5     012           JMP        FUN_1158_1182                                    int FUN_1158_1182(Dgn1150_0238_0
                             -- Flow Override: CALL_RETURN (CALL_TERMINATOR)
                             LAB_1158_1bb1                                   XREF[1]:     1158:1ba9(j)  
   1158:1bb1 68 c2 1b     010           PUSH       0x1bc2
   1158:1bb4 e9 19 fb     012           JMP        FUN_1158_16d0                                    int FUN_1158_16d0(Dgn1150_0238_0
                             -- Flow Override: CALL_RETURN (CALL_TERMINATOR)
                             LAB_1158_1bb7                                   XREF[1]:     1158:1ba1(j)  
   1158:1bb7 26 81 7c     010           CMP        word ptr ES:[SI + -0x14],0x7000
             ec 00 70
   1158:1bbd 74 06        010           JZ         LAB_1158_1bc5
   1158:1bbf e8 b4 f9     010           CALL       FUN_1158_1576                                    int FUN_1158_1576(Dgn1150_0238_0
   1158:1bc2 01 46 fa     010           ADD        word ptr SS:[BP + local_8],AX

resulting in:

         if (pElem->field1_0x2 != 0x7000) {
            if ((pElem + -1)->n?Thr_0x0 != 0x7000) {
               iVar1 = FUN_1158_1182(local_10);
               return iVar1;
            }
            iVar1 = FUN_1158_16d0(local_10);
            return iVar1;
         }
         if ((pElem + -1)->n?Thr_0x0 != 0x7000) {
            iVar1 = FUN_1158_1576(local_10);
            local_8 += iVar1;
         }

instead of something like:

         if (pElem->field1_0x2 != 0x7000) {
            if ((pElem + -1)->n?Thr_0x0 != 0x7000) {
               iVar1 = FUN_1158_1182(local_10);
            }
            else {
               iVar1 = FUN_1158_16d0(local_10);
            }
         }
         else if ((pElem + -1)->n?Thr_0x0 != 0x7000) {
            iVar1 = FUN_1158_1576(local_10);
         }
         local_8 += iVar1;

To Reproduce Steps to reproduce the behavior:

  1. Load in the enclosed function (from the Decompile:Panels Debug Function Decompilation menu)
  2. Search down the decompiled code to find the output above
  3. See issue

Expected behavior Recognise the return address and decompile correctly.

Screenshots If applicable, add screenshots to help explain your problem.

Attachments dragon_FUN_1158_1b78.zip

Environment (please complete the following information):

Additional context N/A

marcushall42 commented 1 year ago

Just curious why any compiler would produce such code? It seems that a CALL followed by a branch should be 5 bytes while PUSH/JMP is 6 bytes. I can't imagine that this is any faster.. If it's hand written to confuse disassemblers, then maybe. It would be hard to sort out that the PUSH is a return address and not an argument to a tail-optimized CALL-RETURN.

Wall-AF commented 1 year ago

@marcushall42 I've slept on this and realise it is a speed optimisation. With a CALL followed by a JMP, the return address still has to be placed on the stack preceeding the CALL and a two IP changes are required, one from the CALL return and the second with the JMP to get to the next instruction. However, using a PUSH followed by a JMP the return address is the next instruction address, thus avoiding a second IP change.

Wall-AF commented 1 year ago

It would be hard to sort out that the PUSH is a return address and not an argument to a tail-optimized CALL-RETURN.

I don't believe so because Ghidra has already determined the JMP is a substitute for a CALL (even as opposed to a CALLF) and that as part of a CALL/CALLF a return (16-/32-bit) address is pushed onto the stack, therefore when making the decision that the JMP is actually a CALL/CALLF it should be able to backtrack to the most recent, appropriately-sized, PUSH (or muliple PUSH) statement(s) in order to obtain the (CS:)IP address of the following statement, post the CALL! (Trouble is, I don't know where to look to fix it!!!)

marcushall42 commented 1 year ago

I believe that one of the Analysis tools looks for a jump to another function and changes the flow override to CALL-RETURN. It isn't a particularly deep analysis, but it is trying to detect this sort of case:

fcnA (arg1, arg2) { some stuff fncB(arg1, arg2); }

The optimizer can replace the call to fcnB and return afterwards with just a jump to fcnB. That is what the analysis routine is trying to detect.

The trouble with replacing PUSH xxx; JMP yyy with CALL yyy; GOTO xxx is that the PCODE for the two existing instructions are doing the "wrong thing" and there isn't any flow alteration that could correct it.

I had been concerned that it would be difficult to decide that the PUSH was not pushing an argument to the function, but I guess that the arguments all have to be pushed before the return address, so PUSH;JMP must be pushing an effective return address. I think that there is hope to sort through it so long as there are no other instructions between the PUSH and JMP instructions...

I recall seeing somewhere that ghidra can spot particular instruction sequences and replace them with custom Pcode. This is designed for dealing with switch helper funcitons and such. I would think that trying to apply that mechanism here would be the most useful way to get ghidra to grok what's happening.

emteere commented 1 year ago

Can you manually override the CALL/RETURN to a CALL? That should at least fix the issue, but not automatically recognize the instructions.

It might be possible for the SharedCallReturnCallsAnalyzer to handle the special case, and instead of an over-ride of CALL/RETURN, override to just a CALL.

That is really tricky to change locally because you don't know if the PUSH was for a call that will occur later in the function, and that the JUMP isn't part of the current function. So maybe this is the best compromise.

marcushall42 commented 1 year ago

They could certainly change the override to CALL, but then the remaining problem is to somehow transform the preceding PUSH into a following GOTO, since the PUSH is what supplies the return address.

I think that the Pcode substitution I had mentioned before is limited to replacing the functionality of specified subroutines only, not any arbitrary instruction sequence. So, I don't really see any easy way forward for ghidra to grok this sort of code sequence. But I'm certainly not as cleaver as some people out there, so maybe there's hope...

emteere commented 1 year ago

I see the PUSH is going to another address, not just below the function. Yes that is a trickier issue.

In addition to changing the CALL/RETURN to a CALL, the fall-thru address can be set to the location of the PUSHED address in the popup action FallThrough->Set... The decompiler will honor the address set as the fallthru.

It is getting a bit more complicated than I'd do in a generic analyzer because every jump would need to be checked, and that PUSH could be a saved value or a value for a future call. It could be put in a 16-bit x86Analyzer, which is need to to solve some other issues. (CS, segments, etc). I suppose it still could be done in the SharedCallReturnAnalyzer. A bit more analysis would need to be done to see what is on the top of the stack at the jump. A hack might be to just check the instruction before to see if it is a PUSH'ed value.

A script could easily check all these places and do the surgery.

Wall-AF commented 1 year ago

Can you manually override the CALL/RETURN to a CALL? That should at least fix the issue, but not automatically recognize the instructions.

Yes, that is possible, but automated would be preferable!

It might be possible for the SharedCallReturnCallsAnalyzer to handle the special case, and instead of an over-ride of CALL/RETURN, override to just a CALL.

Are you referring to the 2 classes SharedReturnAnalyzer and SharedReturnJumpAnalyzer?

That is really tricky to change locally because you don't know if the PUSH was for a call that will occur later in the function, and that the JUMP isn't part of the current function. So maybe this is the best compromise.

If you can determine that the JMP is actually a CALL (not sure where/how this is currently done, but clearly it is) then the calling convention of that called function determines how the parameter & return address of that call is passed and therefore should be recognisable as such. However, if you're saying the current analysis is (excuse the pun) jumping to the wrong conclusion, as in the case where the JMP is actually a follow-through, it should really not be a CALL, then we have a different issue!

emteere commented 1 year ago

Determining that a JMP should be turned into a call is done by detecting a JMP to a place that is also CALL'ed which implies it is a function, the JMP jumps over an existing function, or the JMP target is to an existing function.

There was a caveat recently added that the JMP target is not fallen into by another function.

There isn't any code analysis to the two analyzers you mention. There are other analyzers that track registers and return address to determine if a location such as a return/jump/call through a register is actually something different than the default pcode flow or return/jump/call.

emteere commented 1 year ago

Is this something you are seeing in many binaries, or just one in particular?

Wall-AF commented 1 year ago

Determining that a JMP should be turned into a call is done by detecting a JMP to a place that is also CALL'ed which implies it is a function, the JMP jumps over an existing function, or the JMP target is to an existing function.

Which class(es) (Java and/or C++) is(/are) determining this please? I'd like to see how it does that in code!

There are other analyzers that track registers and return address to determine if a location such as a return/jump/call through a register is actually something different than the default pcode flow or return/jump/call.

Which ones?

Is this something you are seeing in many binaries, or just one in particular?

So far, just the one. But it seems a reasonable methodology to be decypherable by an excellent RE tool.

Wall-AF commented 1 year ago

If there's a way to analyse this, it may be worth having a configurable number of instructions to look back for the return address PUSH instruction on a function by function basis.

Wall-AF commented 1 year ago

The minimum that's needed is to be able to manually override this -- Flow Override: CALL_RETURN (CALL_TERMINATOR) with something like -- Fallthrough Override: CALL_RETURN (1158:1bc2) and for the decompiler to act accordingly. At present, the disassembly shows:

   1158:1b9b 26 81 7c     010           CMP        word ptr ES:[SI + 0x2],0x7000
             02 00 70
   1158:1ba1 74 14        010           JZ         LAB_1158_1bb7
   1158:1ba3 26 81 7c     010           CMP        word ptr ES:[SI + -0x14],0x7000
             ec 00 70
   1158:1ba9 74 06        010           JZ         LAB_1158_1bb1
   1158:1bab 68 c2 1b     010           PUSH       0x1bc2
   1158:1bae e9 d1 f5     012           JMP        FUN_1158_1182#5758                               int FUN_1158_1182#5758(Dgn1150_0
                             -- Flow Override: CALL_RETURN (CALL_TERMINATOR)
                             -- Fallthrough Override: 1158:1bc2
                             LAB_1158_1bb1                                   XREF[1]:     1158:1ba9(j)  
   1158:1bb1 68 c2 1b     010           PUSH       0x1bc2
   1158:1bb4 e9 19 fb     012           JMP        FUN_1158_16d0                                    int FUN_1158_16d0(Dgn1150_0238_0
                             -- Flow Override: CALL_RETURN (CALL_TERMINATOR)
                             -- Fallthrough Override: 1158:1bc2
                             LAB_1158_1bb7                                   XREF[1]:     1158:1ba1(j)  
   1158:1bb7 26 81 7c     010           CMP        word ptr ES:[SI + -0x14],0x7000
             ec 00 70
   1158:1bbd 74 06        010           JZ         LAB_1158_1bc5
   1158:1bbf e8 b4 f9     010           CALL       FUN_1158_1576                                    int FUN_1158_1576(Dgn1150_0238_0
                             LAB_1158_1bc2                                   XREF[2]:     1158:1bae, 1158:1bb4  
   1158:1bc2 01 46 fa     010           ADD        word ptr SS:[BP + local_8],AX

and the decompiled code doesn't change as it is just following the first -- Flow Override: CALL_RETURN (CALL_TERMINATOR) and still shows:

         if (pElem->field1_0x2 != 0x7000) {
            if ((pElem + -1)->n?Thr_0x0 != 0x7000) {
               iVar1 = FUN_1158_1182#5758(local_10);
               return iVar1;
            }
            iVar1 = FUN_1158_16d0(local_10);
            return iVar1;
         }
         if ((pElem + -1)->n?Thr_0x0 != 0x7000) {
            iVar1 = FUN_1158_1576(local_10);
            local_8 += iVar1;
         }
      }
emteere commented 1 year ago

You need to change the CALL_RETURN override to a CALL. I believe the flow should work correctly.

If you leave it as CALL_RETURN, then the decompiler doesn't use the fall-thru address, because it isn't supposed to have one.

Wall-AF commented 1 year ago

You need to change the CALL_RETURN override to a CALL. I believe the flow should work correctly.

If you leave it as CALL_RETURN, then the decompiler doesn't use the fall-thru address, because it isn't supposed to have one.

As it stands, this override cannot be dropped or changed!

emteere commented 1 year ago

The instruction flow should be able to be changed from a CALL_RETURN to a CALL. Is it changed back, possibly by an analyzer?

If when you modify it to a CALL it is changed back to a CALL_RETURN, then it may be an analyzer doing it. You can test it by turning off the SharedReturnAnalyzer in Edit->ProgramOptions->Analyzers temporarily (don't re-import/re-analyze) and then modifying the instruction flow to a CALL.

If this is happening, we should fix the issue.

Wall-AF commented 1 year ago

No go! I've turned off Shared Return Calls analyser (as below) but the -- Flow Override: CALL_RETURN (CALL_TERMINATOR) will not clear and cannot be converted! image

emteere commented 1 year ago

if you clear the instruction after having turned off the analyzer and then re-disassembling the instruction it comes back?

Wall-AF commented 1 year ago

Thanks @emteere that worked. Should've worked that out!!!

Wall-AF commented 1 year ago

By replacing https://github.com/NationalSecurityAgency/ghidra/blob/0713d91a3b7c174805da29625526e0662a080708/Ghidra/Features/Base/src/main/java/ghidra/app/cmd/analysis/SharedReturnAnalysisCmd.java#L423-L425 with

    Address retAddr = null;
    Address searchAddr = ref.getFromAddress();
    int searchBack = 1; // Should be configurable number of instructions to search back to find a PUSH
    while (searchBack-- > 0) {
        Instruction prevInstr = program.getListing().getInstructionBefore(searchAddr);
        searchAddr = prevInstr.getFromAddress(); // for next loop
        if ("PUSH".equals(prevInstr.getMnemonicString())) {
            Scalar scalar = prevInstr.getScalar(0);
            long offset = prevInstr.getAddress().getOffset();
            if (prevInstr.getAddress() instanceof SegmentedAddress segAddr) {
                offset = segAddr.getSegmentOffset();;
            }
            retAddr = prevInstr.getAddress().subtract(offset).add(scalar.getValue());
            break;
        }
    }
    FlowOverride flowType = FlowOverride.CALL_RETURN;
    if (retAddr != null) {
        flowType = FlowOverride.CALL;
        SetFallThroughCmd cmdFT = new SetFallThroughCmd(refInstrAddr, retAddr);
        cmdFT.applyTo(program);
    }
    SetFlowOverrideCmd cmd = new SetFlowOverrideCmd(refInstrAddr, flowType);
    cmd.applyTo(program);

I fixed the analyser!

NB: Ideally the class SegmentedAdress requires to add an override for the function getOffset and then the address calculation could be streamlined. I've also assumed a near JMP which I'm sure could be generalised with a little more thought! Finally, the while loop just needs to have its guard setting to a configurable value (possibly at the instruction level) to limit the number of previous instructions in which to search for the return address PUSHed!

Wall-AF commented 1 year ago

@emteere Working on a better(/more correct!) solution, but need to know where/how the original bytes get converted into the JMP instruction so I can check if the toAddress is 2 or 4 bytes!

marcushall42 commented 1 year ago

I suppose that you could check Instruction.getLength().

For a "proper" fix, there is still the problem that the PUSH instruction is affecting the stack in a way that I believe will confuse things like CallDepthChangeInfo won't understand. But there may not be much that can be done about that.. The SharedReturnAnalysisCmd is supposed to work for all "languages" (processors), so x86 specific things should be applied only if that is the language in use in the Program. It is best to look at the PcodeOp level since all languages get converted into PcodeOps. In a perfect world, enhancements to SharedReturnAnalysisCmd should work on x86, arm, and even 6502 code. It makes things a bit more challenging, but generality here is a thing of beauty.

Wall-AF commented 1 year ago

I'm almost there, peer review notwithstanding! However, I'm stuck! When the code register (CS) is pushed onto the stack, I would have expected to be able to get at its value, but using the Java excerpt below seems to give an incorrect value!

        InstructionPrototype proto = instr.getPrototype();
        switch (instr.getOperandType(0)) {
        case OperandType.SCALAR:
            opInfo.setOffset(instr.getScalar(0).getUnsignedValue());
            opInfo.setSize(removeLeadingZeros(proto.getOperandValueMask(0).toString()).length() / 2);
            break;
        case OperandType.REGISTER:
            Register reg = proto.getRegister(0, instr.getInstructionContext());
            opInfo.setOffset(reg.getOffset());
            opInfo.setSize(reg.getNumBytes());
...

It's returning an invalid segment number!

Is this because I'm missing some additional step required for segmented addresses?

emteere commented 1 year ago

You can add a hack there, but that is definitely not general code in a general analyzer / code that works across all processors. As @marcushall42 says it is general code and meant to work on every processor.

If you only have the issue in one binary, it might be better dealt with in a script. I usually spec things out as a script before I make the decision it is worth automating. Although trying it in code to see what is required isn't bad, it just could never stay in that code.

It is possible an override mechanism of the base shared return analyzer for a more specific 16-bit x86 or other processor could be added as is done for constant analysis, but that hasn't been necessary as the shared return analyzer is general across all processors. So the SharedReturnCmd would make the decision, and then the extra analyzer or overridden analyzer would fix the issue.

There will most likely be a 16-bit x86 analyzer at some point in the future to better deal with the CS segment in real-mode, and it could be added in there. It would have access to the current stack values as it is done with pcode. There would be a chicken and egg issue of having created a function at the target when currently the decision to turn the jump into a call is in the shared return analyzer. The constant propagation could be used to figure out the push value and segment if necessary.

marcushall42 commented 1 year ago

reg.Offset() is the address within register space that identifies the register. It comes from the sleigh definition of the language. The contents of a register are not known in the instruction itself. Some analyzers may determine the value based on register load instructions, and sometimes it is possible to trace back through the instructions and determine a set of values that may be loaded into a register, but in general it is impossible to predict the value of a register.

emteere commented 1 year ago

The PUSH does not have a special case for the CS segment that would try to have a real value. And it would likely still have the operand as a REGISTER, but just push the correct value. The CS segment value would also be incorrect for RealMode and is only kept correct for protected mode.

If you are assuming the JMP is coming back to the same function just at another location it should have the same segment, do you need the value pushed for a far jMP?

Wall-AF commented 1 year ago

For now, I've got the simple case (a single PUSH of the address offset) to be recognised with no obvious side-effects. I'll leave it there for now until I come across a real situation involving the more complex relationships. Meanwhile, thanks @emteere and @marcushall42, and look forward to the day when an official 16-bit x86 analyser is released. (I live in hope!!!)

Wall-AF commented 1 year ago

Looks like an additional FlowOverride & RefType are required to support CALLF as my attempt is having the side effect that a push of a far return address isn't processed correctly because the offset portion of the address is not being accepted as part of the address, but as another parameter! (This isn't the case with a natural CALLF.)

@emteere @marcushall42 any help to do this would be much appreciated, ta.

emteere commented 1 year ago

Do you mean by the decompiler?

My suspicion is if you don't tell the decompiler that the Function at the JMP target is a FAR call calling convention, the return address is 4 bytes, it will assume that the return address is only 2 bytes.

Wall-AF commented 1 year ago

Looks like an additional FlowOverride&RefType are required

I meant the above to inform the decompiler.


When I changed the raw bytes to represent a far JMP the assembler recognises it and shows JMPF. The subsequent override then doesn't appear to inform the decompiler that the overridden JMPF is actually a CALLF. Thereforre, are you saying that when I fix the override I need to do something else at that point, if so, what?