NationalSecurityAgency / ghidra

Ghidra is a software reverse engineering (SRE) framework
https://www.nsa.gov/ghidra
Apache License 2.0
49.33k stars 5.67k forks source link

68000 module parses sign extended addresses incorrectly for move instructions #766

Open zznop opened 5 years ago

zznop commented 5 years ago

Describe the bug

Ghidra incorrectly disassembles certain move* 68000 instructions, specifically instructions where the addresses are sign extended.

For example, Ghidra disassembles the instruction below (from IDA output):

ROM:00001A56 11D8 FE80                                               move.b  (a0)+,(word_FFFE80).w

as

00001a56 11 d8 fe 80     move.b     (A0)+,(DAT_fffffe80).w

"The 68000 has a 24‑bit address space, allowing access to up to 16 MB of memory." This differs from later 6800* variants that have 32-bit address buses

To Reproduce Steps to reproduce the behavior: 1) In a hex editor, create a binary with 11 d8 fe 80 and save it 2) Throw it into a ghidra project and select M6800 default for the processor 3) Go to byte zero and hit 'D'

Expected behavior

Ghidra to display the instruction as:

00001a56 11 d8 fe 80     move.b     (A0)+,(DAT_fffe80).w

(24-bit addressing)

philpem commented 5 years ago

Looks like the SLEIGH code for 68k should be masking off the most significant byte. I'd suggest checking the Motorola documentation to see if that byte has any special meaning.

It would also be useful to add a warning comment or annotation when this is encountered, but I don't know if SLEIGH can handle this.

zznop commented 5 years ago

In the case described above the opcode is 11 d8 fe 80 the address fe 80 should be extended to 24-bits (0xfffe80) because the 68000 contains a 24-bit address bus. But because the 68000 interface is written for 32-bit wide addresses (later 68000 series processors) it is extended to 0xfffffe80. I've looked at the 68000.sinc file and attempted to make modifications to the moves.w and moves.b lines to remove the top byte, but there's more to it than that as other IDL files contain information on address sizes. Also, by applying those masks explicitly it will break the sla interfaces for the later processors. Idk, if the 68000 needs its own *.sinc file or what. I'd like to hear from a Ghidra core dev.

ggnkua commented 5 years ago

Okay, I think that @zznop is not correct.

Now, I don't have a plain 68000 programmer's manual, only 020+ and the usual all-in-one PDF that floats around the internet, but at least there the following is stated for short addressing mode:

2.2.16 Absolute Short Addressing Mode

In this addressing mode, the operand is in memory, and the address of the operand is in the extension word. The 16-bit address is sign-extended to 32 bits before it is used. .

So internally the 68000 core will sign extend the address to 32 bits. The fact that the 68000 only has 24 address pins exposed is just a shortcoming of the chip.

To give a more concrete example: The Atari ST and STE both have a 68000 CPU installed. Hardware registers are mapped from $f00000 to $ffffff (i.e. around the 15 and 16mb address space). So if we want to access the background palette register (for example) we can write:

move.w #$fff,$ff8240 and move.w #$fff,$ffff8240.w

They're both equivalent. But also we can also write:

lea $ffff8240.w,a0
move.w #$fff,(a0)

Especially the last one proves that one can pretty much stick whatever they want on the higher byte of the address and the CPU will still access the correct 24-bit address. How could it otherwise? Only the 24 lower bits are exposed to the bus.

So, like I said, I believe this is not a bug, and should be closed.

zznop commented 5 years ago

@ggnkua What you are saying is that when analyzing 68000 code the user should know to just ignore the top 8 bits of 0xfffffe80 for a move.b (A0)+,(DAT_fffffe80).w (for MC68000)? Ok, that's all great until you're in Ghidra and your xref's don't work. This effects auto analysis both from an aesthetic point of view and functionally. If I create a RAM segment from 0xff0000 to 0xffffff at 0xfffe80 there should be a label and cross reference, but there's not because Ghidra thinks move.b (A0)+,(DAT_fffffe80).w is referencing 0xffffe80. This is very much an issue. IDA handles it correctly and now Binary Ninja's processor module handles it correctly after I worked with their devs to fix the same issue.

Also, to reaffirm the issue, if you look at the 68000.ldefs file. 68000:BE:32:default uses the 68040 sla file. They don't have true support for native 68000 at all. Just the instruction set as it pertains to the later 68k 32-bit processors.

ggnkua commented 4 years ago

Right, I see your point. As a workaround how about creating another ram segment from $ffff0000 to $ffffffff? If the code accesses the hardware registers consistantly (i.e. either $ffxxxx or $xxxx.w) then it won't matter which bucket of RAM it ends up. Worst case you have to check in both memory areas for xrefs.

zznop commented 4 years ago

Thanks, but I'll just keep using Binary Ninja. I just wanted to bring this to the attention of the devs. You are 100% correct in that move.b (%a0)+, (0xfffffe80), and move.b (%a0)+, (0xfffe80) assemble to the same opcode. But because Ghidra is more than just a disassembler (unlike capstone or objdump), it has to take in account processor behavior for things like xref's, decompilation, type propagation, and more. I recommend one of two things:

  1. Sleigh interface for MC68000 (this would be ideal, and probably wouldn't be too much work considering the instruction set is already supported)
  2. Remove 68000:BE:32:default from the processor list and add a MC68040 (in other words, don't advertise native 68000 support)

I found this related issue: https://github.com/NationalSecurityAgency/ghidra/issues/230

dyneon commented 1 month ago

I am also encountering this “issue” using a CPU32 processor, which also has a 24-but physical bus. While I agree the current implementation is technically correct, it does make the disassembly more messy.

Is there an example of how to mask out the byte in SLEIGH? I’ve attempted but I’m can’t find a solution.

And how does the byte become FF? Does SLEIGH use an implicit SEXT instruction to make the address 32 bits wide?