Open Arcnor opened 5 years ago
Maybe the same underlying problem as #958.
I actually "fixed" this by removing some code on the SLEIGH for relative calls that was forcing the result to be the absolute value, but I'm not sure this fix is correct. After all, somebody added that on purpose. Maybe I'll open a PR to get some advice.
Looks like my fix didn't completely work, which is not surprising. JMP
with relative addresses are also affected, I'm getting a lot of nonsense calls on another target.
This is still happening on 9.1.2 and got bitten by it again today. Is there any workaround for this? And sorry for pinging you @GregoryMorse but I was hoping you know if one of your many unmerged PRs might fix this?
This seems to be a problem between REAL mode and PROTECTED mode in 16-bit x86 code. The way paging is done is different between the two so both modes must be supported. The current behavior seems to only support real mode and not protected mode. Protected mode 16-bit x86 apps have a lot of other subtle issues to especially around the way segmentation is handled in the decompiler core. I don't think any of my PR would have fixed this issue.
I encountered the problem recently and opened #4074 before finding this issue. Is there any workaround? If not, do you have any entry point in ghidra's code for jump address calculation?
@kevinferrare as one of my comments above mentions, you can edit the SLEIGH code for the x86 processor and remove the forceful unsigned conversion for calls, but as I wrote after that, the same problem happens with jumps (and might be fixed in the same way).
Unfortunately, I still don't know why the value is being forcefully unsigned, but maybe is due to what Gregory describes (differences between real and protected modes) so there might be issues when doing this, YMMV.
@Arcnor Thank you for your reply. I am not sure where there is an unsigned conversion. For rel16 in current version: rel16: reloc is simm16 [ reloc=((inst_next >> 16) << 16) | ((inst_next + simm16) & 0xFFFF); ] { export *[ram]:$(SIZE) reloc; }
It is taking simm16 which is signed, and offset seems to be as expected.
However, segment computation seem wrong, instead of inst_next >> 16 it would be better to reference the segment in the ram map, not sure how to do that though ...
@kevinferrare yeah, the problem is that doing bit twiddling like that will never work (when simm16
is negative it will never go back to the previous segment). I can't remember what I did (and I probably lost my changes, I've settled for hacks since then) but it should be possible to change that operation to do the "right thing" (which as a hack might be simply reloc = inst_next + simm16
but will probably break one of the two modes)
@Arcnor I tried that, but it makes things globally worse for my small use case. Example: 0000:1ee6 e8 9e ae CALL SUB_ffff_cd87 1ee9+ae9e = cd87 which is sign extended to FFFFCD87 I need to see if it is possible to workaround that.
So I tried various things with various degrees of success:
modifying the logic of ia.sinc to hardcode the segment values of the program I am analyzing and using that to calculate addresses => couldn't make it work because "ifs" are only possible in the p-code of the semantic section (the section with {}) and not in the Disassembly Actions Section (the section with []). Maybe there are other ways, but I couldn't figure out how.
Modifying the references in the UI post-disassembly => It is possible to do so via the reference edit menu: right-click on a jump / call instruction, choose References -> add / edit and there you can modify the flow ghidra detected: I wrote a plugin to do so automatically from the data exported from my emulator => This makes the disassembly look correct, but target code of the calls / jumps is somehow not taken into account by ghidra in the function flow, meaning that it will not show in decompile view an not be part of the function when accessing the ghidra scripting API
Making sure the segment I want to analyze is aligned to a multiple of 0x1000 => This works the best, but if there are multiple segments it means multiple ghidra files, and everything that jumps cross segment is still going to be broken
Describe the bug As you can see on the screenshot, Ghidra is decoding near CALLs on 16 bit mode incorrectly, because it's taking the relative address displacement as an unsigned value instead of a signed one (or because it's always treating the value as 32 bit, which will also explain the problem).
Explaining the screenshot a bit:
2a21:5fcc
, which is0x301dc
as linear address.0xfc86
bytes from there, which is-0x37A
signed. That means it should end up at2a21:5c52
(0x2fc62
linearly).3000:fe62
(which btw is a non existing segment, as I merged it because it was wrong, but that's probably a separate bug) which is0x3fe62
as linear address, so it looks like Ghidra just did0x301dc + 0xfc86
instead of0x301dc - 0x37a
.Screenshots
Environment (please complete the following information):