Open Wall-AF opened 1 year ago
Just curious why any compiler would produce such code? It seems that a CALL followed by a branch should be 5 bytes while PUSH/JMP is 6 bytes. I can't imagine that this is any faster.. If it's hand written to confuse disassemblers, then maybe. It would be hard to sort out that the PUSH is a return address and not an argument to a tail-optimized CALL-RETURN.
@marcushall42 I've slept on this and realise it is a speed optimisation. With a CALL
followed by a JMP
, the return address still has to be placed on the stack preceeding the CALL
and a two IP
changes are required, one from the CALL
return and the second with the JMP
to get to the next instruction. However, using a PUSH
followed by a JMP
the return address is the next instruction address, thus avoiding a second IP
change.
It would be hard to sort out that the PUSH is a return address and not an argument to a tail-optimized CALL-RETURN.
I don't believe so because Ghidra has already determined the JMP
is a substitute for a CALL
(even as opposed to a CALLF
) and that as part of a CALL
/CALLF
a return (16-/32-bit) address is pushed onto the stack, therefore when making the decision that the JMP
is actually a CALL
/CALLF
it should be able to backtrack to the most recent, appropriately-sized, PUSH
(or muliple PUSH
) statement(s) in order to obtain the (CS:
)IP
address of the following statement, post the CALL
! (Trouble is, I don't know where to look to fix it!!!)
I believe that one of the Analysis tools looks for a jump to another function and changes the flow override to CALL-RETURN. It isn't a particularly deep analysis, but it is trying to detect this sort of case:
fcnA (arg1, arg2) { some stuff fncB(arg1, arg2); }
The optimizer can replace the call to fcnB and return afterwards with just a jump to fcnB. That is what the analysis routine is trying to detect.
The trouble with replacing PUSH xxx; JMP yyy with CALL yyy; GOTO xxx is that the PCODE for the two existing instructions are doing the "wrong thing" and there isn't any flow alteration that could correct it.
I had been concerned that it would be difficult to decide that the PUSH was not pushing an argument to the function, but I guess that the arguments all have to be pushed before the return address, so PUSH;JMP must be pushing an effective return address. I think that there is hope to sort through it so long as there are no other instructions between the PUSH and JMP instructions...
I recall seeing somewhere that ghidra can spot particular instruction sequences and replace them with custom Pcode. This is designed for dealing with switch helper funcitons and such. I would think that trying to apply that mechanism here would be the most useful way to get ghidra to grok what's happening.
Can you manually override the CALL/RETURN to a CALL? That should at least fix the issue, but not automatically recognize the instructions.
It might be possible for the SharedCallReturnCallsAnalyzer to handle the special case, and instead of an over-ride of CALL/RETURN, override to just a CALL.
That is really tricky to change locally because you don't know if the PUSH was for a call that will occur later in the function, and that the JUMP isn't part of the current function. So maybe this is the best compromise.
They could certainly change the override to CALL, but then the remaining problem is to somehow transform the preceding PUSH into a following GOTO, since the PUSH is what supplies the return address.
I think that the Pcode substitution I had mentioned before is limited to replacing the functionality of specified subroutines only, not any arbitrary instruction sequence. So, I don't really see any easy way forward for ghidra to grok this sort of code sequence. But I'm certainly not as cleaver as some people out there, so maybe there's hope...
I see the PUSH is going to another address, not just below the function. Yes that is a trickier issue.
In addition to changing the CALL/RETURN to a CALL, the fall-thru address can be set to the location of the PUSHED address in the popup action FallThrough->Set... The decompiler will honor the address set as the fallthru.
It is getting a bit more complicated than I'd do in a generic analyzer because every jump would need to be checked, and that PUSH could be a saved value or a value for a future call. It could be put in a 16-bit x86Analyzer, which is need to to solve some other issues. (CS, segments, etc). I suppose it still could be done in the SharedCallReturnAnalyzer. A bit more analysis would need to be done to see what is on the top of the stack at the jump. A hack might be to just check the instruction before to see if it is a PUSH'ed value.
A script could easily check all these places and do the surgery.
Can you manually override the CALL/RETURN to a CALL? That should at least fix the issue, but not automatically recognize the instructions.
Yes, that is possible, but automated would be preferable!
It might be possible for the SharedCallReturnCallsAnalyzer to handle the special case, and instead of an over-ride of CALL/RETURN, override to just a CALL.
Are you referring to the 2 classes SharedReturnAnalyzer
and SharedReturnJumpAnalyzer
?
That is really tricky to change locally because you don't know if the PUSH was for a call that will occur later in the function, and that the JUMP isn't part of the current function. So maybe this is the best compromise.
If you can determine that the JMP
is actually a CALL
(not sure where/how this is currently done, but clearly it is) then the calling convention of that called function determines how the parameter & return address of that call is passed and therefore should be recognisable as such. However, if you're saying the current analysis is (excuse the pun) jumping to the wrong conclusion, as in the case where the JMP
is actually a follow-through, it should really not be a CALL
, then we have a different issue!
Determining that a JMP should be turned into a call is done by detecting a JMP to a place that is also CALL'ed which implies it is a function, the JMP jumps over an existing function, or the JMP target is to an existing function.
There was a caveat recently added that the JMP target is not fallen into by another function.
There isn't any code analysis to the two analyzers you mention. There are other analyzers that track registers and return address to determine if a location such as a return/jump/call through a register is actually something different than the default pcode flow or return/jump/call.
Is this something you are seeing in many binaries, or just one in particular?
Determining that a JMP should be turned into a call is done by detecting a JMP to a place that is also CALL'ed which implies it is a function, the JMP jumps over an existing function, or the JMP target is to an existing function.
Which class(es) (Java and/or C++) is(/are) determining this please? I'd like to see how it does that in code!
There are other analyzers that track registers and return address to determine if a location such as a return/jump/call through a register is actually something different than the default pcode flow or return/jump/call.
Which ones?
Is this something you are seeing in many binaries, or just one in particular?
So far, just the one. But it seems a reasonable methodology to be decypherable by an excellent RE tool.
If there's a way to analyse this, it may be worth having a configurable number of instructions to look back for the return address PUSH
instruction on a function by function basis.
The minimum that's needed is to be able to manually override this
-- Flow Override: CALL_RETURN (CALL_TERMINATOR)
with something like
-- Fallthrough Override: CALL_RETURN (1158:1bc2)
and for the decompiler to act accordingly. At present, the disassembly shows:
1158:1b9b 26 81 7c 010 CMP word ptr ES:[SI + 0x2],0x7000
02 00 70
1158:1ba1 74 14 010 JZ LAB_1158_1bb7
1158:1ba3 26 81 7c 010 CMP word ptr ES:[SI + -0x14],0x7000
ec 00 70
1158:1ba9 74 06 010 JZ LAB_1158_1bb1
1158:1bab 68 c2 1b 010 PUSH 0x1bc2
1158:1bae e9 d1 f5 012 JMP FUN_1158_1182#5758 int FUN_1158_1182#5758(Dgn1150_0
-- Flow Override: CALL_RETURN (CALL_TERMINATOR)
-- Fallthrough Override: 1158:1bc2
LAB_1158_1bb1 XREF[1]: 1158:1ba9(j)
1158:1bb1 68 c2 1b 010 PUSH 0x1bc2
1158:1bb4 e9 19 fb 012 JMP FUN_1158_16d0 int FUN_1158_16d0(Dgn1150_0238_0
-- Flow Override: CALL_RETURN (CALL_TERMINATOR)
-- Fallthrough Override: 1158:1bc2
LAB_1158_1bb7 XREF[1]: 1158:1ba1(j)
1158:1bb7 26 81 7c 010 CMP word ptr ES:[SI + -0x14],0x7000
ec 00 70
1158:1bbd 74 06 010 JZ LAB_1158_1bc5
1158:1bbf e8 b4 f9 010 CALL FUN_1158_1576 int FUN_1158_1576(Dgn1150_0238_0
LAB_1158_1bc2 XREF[2]: 1158:1bae, 1158:1bb4
1158:1bc2 01 46 fa 010 ADD word ptr SS:[BP + local_8],AX
and the decompiled code doesn't change as it is just following the first -- Flow Override: CALL_RETURN (CALL_TERMINATOR)
and still shows:
if (pElem->field1_0x2 != 0x7000) {
if ((pElem + -1)->n?Thr_0x0 != 0x7000) {
iVar1 = FUN_1158_1182#5758(local_10);
return iVar1;
}
iVar1 = FUN_1158_16d0(local_10);
return iVar1;
}
if ((pElem + -1)->n?Thr_0x0 != 0x7000) {
iVar1 = FUN_1158_1576(local_10);
local_8 += iVar1;
}
}
You need to change the CALL_RETURN override to a CALL. I believe the flow should work correctly.
If you leave it as CALL_RETURN, then the decompiler doesn't use the fall-thru address, because it isn't supposed to have one.
You need to change the CALL_RETURN override to a CALL. I believe the flow should work correctly.
If you leave it as CALL_RETURN, then the decompiler doesn't use the fall-thru address, because it isn't supposed to have one.
As it stands, this override cannot be dropped or changed!
The instruction flow should be able to be changed from a CALL_RETURN to a CALL. Is it changed back, possibly by an analyzer?
If when you modify it to a CALL it is changed back to a CALL_RETURN, then it may be an analyzer doing it. You can test it by turning off the SharedReturnAnalyzer in Edit->ProgramOptions->Analyzers temporarily (don't re-import/re-analyze) and then modifying the instruction flow to a CALL.
If this is happening, we should fix the issue.
No go! I've turned off Shared Return Calls
analyser (as below) but the -- Flow Override: CALL_RETURN (CALL_TERMINATOR)
will not clear and cannot be converted!
if you clear the instruction after having turned off the analyzer and then re-disassembling the instruction it comes back?
Thanks @emteere that worked. Should've worked that out!!!
Address retAddr = null;
Address searchAddr = ref.getFromAddress();
int searchBack = 1; // Should be configurable number of instructions to search back to find a PUSH
while (searchBack-- > 0) {
Instruction prevInstr = program.getListing().getInstructionBefore(searchAddr);
searchAddr = prevInstr.getFromAddress(); // for next loop
if ("PUSH".equals(prevInstr.getMnemonicString())) {
Scalar scalar = prevInstr.getScalar(0);
long offset = prevInstr.getAddress().getOffset();
if (prevInstr.getAddress() instanceof SegmentedAddress segAddr) {
offset = segAddr.getSegmentOffset();;
}
retAddr = prevInstr.getAddress().subtract(offset).add(scalar.getValue());
break;
}
}
FlowOverride flowType = FlowOverride.CALL_RETURN;
if (retAddr != null) {
flowType = FlowOverride.CALL;
SetFallThroughCmd cmdFT = new SetFallThroughCmd(refInstrAddr, retAddr);
cmdFT.applyTo(program);
}
SetFlowOverrideCmd cmd = new SetFlowOverrideCmd(refInstrAddr, flowType);
cmd.applyTo(program);
I fixed the analyser!
NB: Ideally the class SegmentedAdress
requires to add an override for the function getOffset
and then the address calculation could be streamlined. I've also assumed a near
JMP
which I'm sure could be generalised with a little more thought! Finally, the while
loop just needs to have its guard setting to a configurable value (possibly at the instruction level) to limit the number of previous instructions in which to search for the return address PUSHed!
@emteere Working on a better(/more correct!) solution, but need to know where/how the original bytes get converted into the JMP
instruction so I can check if the toAddress
is 2 or 4 bytes!
I suppose that you could check Instruction.getLength().
For a "proper" fix, there is still the problem that the PUSH instruction is affecting the stack in a way that I believe will confuse things like CallDepthChangeInfo won't understand. But there may not be much that can be done about that.. The SharedReturnAnalysisCmd is supposed to work for all "languages" (processors), so x86 specific things should be applied only if that is the language in use in the Program. It is best to look at the PcodeOp level since all languages get converted into PcodeOps. In a perfect world, enhancements to SharedReturnAnalysisCmd should work on x86, arm, and even 6502 code. It makes things a bit more challenging, but generality here is a thing of beauty.
I'm almost there, peer review notwithstanding! However, I'm stuck! When the code register (CS
) is pushed onto the stack, I would have expected to be able to get at its value, but using the Java excerpt below seems to give an incorrect value!
InstructionPrototype proto = instr.getPrototype();
switch (instr.getOperandType(0)) {
case OperandType.SCALAR:
opInfo.setOffset(instr.getScalar(0).getUnsignedValue());
opInfo.setSize(removeLeadingZeros(proto.getOperandValueMask(0).toString()).length() / 2);
break;
case OperandType.REGISTER:
Register reg = proto.getRegister(0, instr.getInstructionContext());
opInfo.setOffset(reg.getOffset());
opInfo.setSize(reg.getNumBytes());
...
It's returning an invalid segment number!
Is this because I'm missing some additional step required for segmented addresses?
You can add a hack there, but that is definitely not general code in a general analyzer / code that works across all processors. As @marcushall42 says it is general code and meant to work on every processor.
If you only have the issue in one binary, it might be better dealt with in a script. I usually spec things out as a script before I make the decision it is worth automating. Although trying it in code to see what is required isn't bad, it just could never stay in that code.
It is possible an override mechanism of the base shared return analyzer for a more specific 16-bit x86 or other processor could be added as is done for constant analysis, but that hasn't been necessary as the shared return analyzer is general across all processors. So the SharedReturnCmd would make the decision, and then the extra analyzer or overridden analyzer would fix the issue.
There will most likely be a 16-bit x86 analyzer at some point in the future to better deal with the CS segment in real-mode, and it could be added in there. It would have access to the current stack values as it is done with pcode. There would be a chicken and egg issue of having created a function at the target when currently the decision to turn the jump into a call is in the shared return analyzer. The constant propagation could be used to figure out the push value and segment if necessary.
reg.Offset() is the address within register space that identifies the register. It comes from the sleigh definition of the language. The contents of a register are not known in the instruction itself. Some analyzers may determine the value based on register load instructions, and sometimes it is possible to trace back through the instructions and determine a set of values that may be loaded into a register, but in general it is impossible to predict the value of a register.
The PUSH does not have a special case for the CS segment that would try to have a real value. And it would likely still have the operand as a REGISTER, but just push the correct value. The CS segment value would also be incorrect for RealMode and is only kept correct for protected mode.
If you are assuming the JMP is coming back to the same function just at another location it should have the same segment, do you need the value pushed for a far jMP?
For now, I've got the simple case (a single PUSH of the address offset) to be recognised with no obvious side-effects. I'll leave it there for now until I come across a real situation involving the more complex relationships. Meanwhile, thanks @emteere and @marcushall42, and look forward to the day when an official 16-bit x86 analyser is released. (I live in hope!!!)
Looks like an additional FlowOverride
& RefType
are required to support CALLF
as my attempt is having the side effect that a push of a far
return address isn't processed correctly because the offset
portion of the address is not being accepted as part of the address, but as another parameter! (This isn't the case with a natural CALLF
.)
@emteere @marcushall42 any help to do this would be much appreciated, ta.
Do you mean by the decompiler?
My suspicion is if you don't tell the decompiler that the Function at the JMP target is a FAR call calling convention, the return address is 4 bytes, it will assume that the return address is only 2 bytes.
Looks like an additional
FlowOverride
&RefType
are required
I meant the above to inform the decompiler.
When I changed the raw bytes to represent a far JMP
the assembler recognises it and shows JMPF
. The subsequent override then doesn't appear to inform the decompiler that the overridden JMPF
is actually a CALLF
. Thereforre, are you saying that when I fix the override I need to do something else at that point, if so, what?
Describe the bug When a function call is made using a
JMP
instruction together with a previousPUSH
of the return address, Ghidra realises this to be equivalent to a function call and is interpreted as such. However, the return address is ignored and the following statement is to exit the calling function with areturn
instead of moving to the next statement address specified by thePUSH
of the return address as in the below x86 16-bit assembly extrac:t:resulting in:
instead of something like:
To Reproduce Steps to reproduce the behavior:
Decompile:
PanelsDebug Function Decompilation
menu)Expected behavior Recognise the return address and decompile correctly.
Screenshots If applicable, add screenshots to help explain your problem.
Attachments dragon_FUN_1158_1b78.zip
Environment (please complete the following information):
Additional context N/A