NationalSecurityAgency / ghidra

Ghidra is a software reverse engineering (SRE) framework
https://www.nsa.gov/ghidra
Apache License 2.0
51.58k stars 5.87k forks source link

Incorrect relative CALL decoding on x86-16 bit targets #981

Open Arcnor opened 5 years ago

Arcnor commented 5 years ago

Describe the bug As you can see on the screenshot, Ghidra is decoding near CALLs on 16 bit mode incorrectly, because it's taking the relative address displacement as an unsigned value instead of a signed one (or because it's always treating the value as 32 bit, which will also explain the problem).

Explaining the screenshot a bit:

Screenshots GhidraCALLBug

Environment (please complete the following information):

carlreinke commented 5 years ago

Maybe the same underlying problem as #958.

Arcnor commented 5 years ago

I actually "fixed" this by removing some code on the SLEIGH for relative calls that was forcing the result to be the absolute value, but I'm not sure this fix is correct. After all, somebody added that on purpose. Maybe I'll open a PR to get some advice.

Arcnor commented 5 years ago

Looks like my fix didn't completely work, which is not surprising. JMP with relative addresses are also affected, I'm getting a lot of nonsense calls on another target.

Arcnor commented 4 years ago

This is still happening on 9.1.2 and got bitten by it again today. Is there any workaround for this? And sorry for pinging you @GregoryMorse but I was hoping you know if one of your many unmerged PRs might fix this?

GregoryMorse commented 4 years ago

This seems to be a problem between REAL mode and PROTECTED mode in 16-bit x86 code. The way paging is done is different between the two so both modes must be supported. The current behavior seems to only support real mode and not protected mode. Protected mode 16-bit x86 apps have a lot of other subtle issues to especially around the way segmentation is handled in the decompiler core. I don't think any of my PR would have fixed this issue.

kevinferrare commented 2 years ago

I encountered the problem recently and opened #4074 before finding this issue. Is there any workaround? If not, do you have any entry point in ghidra's code for jump address calculation?

Arcnor commented 2 years ago

@kevinferrare as one of my comments above mentions, you can edit the SLEIGH code for the x86 processor and remove the forceful unsigned conversion for calls, but as I wrote after that, the same problem happens with jumps (and might be fixed in the same way).

Unfortunately, I still don't know why the value is being forcefully unsigned, but maybe is due to what Gregory describes (differences between real and protected modes) so there might be issues when doing this, YMMV.

kevinferrare commented 2 years ago

@Arcnor Thank you for your reply. I am not sure where there is an unsigned conversion. For rel16 in current version: rel16: reloc is simm16 [ reloc=((inst_next >> 16) << 16) | ((inst_next + simm16) & 0xFFFF); ] { export *[ram]:$(SIZE) reloc; }

It is taking simm16 which is signed, and offset seems to be as expected.

However, segment computation seem wrong, instead of inst_next >> 16 it would be better to reference the segment in the ram map, not sure how to do that though ...

Arcnor commented 2 years ago

@kevinferrare yeah, the problem is that doing bit twiddling like that will never work (when simm16 is negative it will never go back to the previous segment). I can't remember what I did (and I probably lost my changes, I've settled for hacks since then) but it should be possible to change that operation to do the "right thing" (which as a hack might be simply reloc = inst_next + simm16 but will probably break one of the two modes)

kevinferrare commented 2 years ago

@Arcnor I tried that, but it makes things globally worse for my small use case. Example: 0000:1ee6 e8 9e ae CALL SUB_ffff_cd87 1ee9+ae9e = cd87 which is sign extended to FFFFCD87 I need to see if it is possible to workaround that.

kevinferrare commented 2 years ago

So I tried various things with various degrees of success: