Auto analysis/detection of jumps into the middle of another instruction

M-a-r-k commented 1 year ago

I noticed this item in the Ghidra 10.4 change history: "Listing. Added ability to reduce an instructions length to facilitate overlapping instructions. This can now be accomplished by specifying an instruction length override on the first instruction and disassembling the bytes which follow it. The need for this has been observed with x86 where there may be a flow around a LOCK prefix byte. (GP-3256)"

Many years ago I disassembled some Motorola 68000 code which did something kind of related. Some versions of the SAS/C compiler for the Commodore Amiga can generate code sequences like this:

    [test some condition]
    bne.b   label+2
    moveq   #123,D0
label   cmpi.w  #$7000,D0       ;Note: $7000 is opcode for moveq #0,D0
    [do stuff depending on value in D0]
    rts

The effect of that to put either 123 or 0 in D0. In that example, if 123 is put in D0, the cmpi.w #$7000,D0 instruction is effectively a no-op (following code doesn't depend on condition codes), so code execution falls through.

However if the branch to label+2 is taken, that does moveq #0,D0 before continuing. Effectively, it's a minor optimization of this:

    [test some condition]
    bne.b   lab2
    moveq   #123,D0
    bra.b   cont
lab2    moveq   #0,D0
cont    [do stuff depending on value in D0]
    rts

(cmpi.w executes faster than bra.b I guess.)

In Ghidra, one example disassembly initially shows:

0021f34c 4e ae ff ac     jsr        (-0x54,A6)
0021f350 4f ef 00 1c     lea        (0x1c,SP),SP
0021f354 4a 80           tst.l      D0
0021f356 67 0a           beq.b      LAB_0021f360+2
0021f358 4a ac 19 dc     tst.l      (0x19dc,A4)=>DAT_00224920
0021f35c 66 04           bne.b      LAB_0021f360+2
0021f35e 70 01           moveq      #0x1,D0
                     LAB_0021f360+2                                  XREF[0,2]:   0021f356(j), 0021f35c(j)  
0021f360 0c 40 70 00     cmpi.w     #0x7000,D0w
0021f364 2e 00           move.l     D0,D7
0021f366 67 18           beq.b      LAB_0021f380
0021f368 48 6c 1a 1a     pea        (0x1a1a,A4)=>DAT_0022495e
...

After right-clicking cmpi.w #0x7000,D0w, choosing "Modify Instruction Length..." and setting length to 2 bytes, the listing looks like this:

...
0021f356 67 0a           beq.b      LAB_0021f362
0021f358 4a ac 19 dc     tst.l      (0x19dc,A4)=>DAT_00224920
0021f35c 66 04           bne.b      LAB_0021f362
0021f35e 70 01           moveq      #0x1,D0
0021f360 0c 40 70 00     cmpi.w     #0x7000,D0w
                     -- Length Override: 2 (actual length is 4)
                     -- Fallthrough: 0021f364
                     LAB_0021f362                                    XREF[2]:     0021f356(j), 0021f35c(j)  
0021f362 70              ??         70h    p
0021f363 00              ??         00h
                     LAB_0021f364                                    XREF[1]:     0021f360  
0021f364 2e 00           move.l     D0,D7
...

The bytes at LAB_0021f362 are not disassembled by default, even though LAB_00212362 is a jump destination.

This issue is to suggest a couple of improvements:

have auto analysis detect and automatically mark instructions as "split"
if the user manually does it, automatically disassemble the bytes (70 00 here) rather than leaving as data.

If anyone is interested in working on that I can upload some Amiga executables which contain those code sequences.

M-a-r-k commented 1 year ago

The first program I found this instruction sequence in was PackDev. Its source code is available for reference.

I think there's only one instance in that particular program, which is where the disassembly fragments above are from. I added ghidra_amiga_ldr to Ghidra to enable it to understand the Amiga hunk executable format. But you could probably just load an executable as a binary image too.

By the way, after Ghidra analyzed PackDev, I did notice an issue with it thinking FUN_00220ee2 does not return. Meaning that all calls to it are "broken" like this after auto-analysis:

00220534 41 ec 01 9a     lea        (0x19a,A4)=>s_Not_enough_memory_002230de,A0      = "Not enough memory\n"
00220538 61 00 09 a8     bsr.w      FUN_00220ee2                                     undefined FUN_00220ee2()
                     -- Flow Override: CALL_RETURN (CALL_TERMINATOR)
0022053c 70              ??         70h    p
0022053d 14              ??         14h
0022053e 60              ??         60h    `
0022053f 00              ??         00h
00220540 09              ??         09h
00220541 98              ??         98h
                     LAB_00220542                                    XREF[1]:     00220532(j)  
00220542 20 2c 1c 94     move.l     (0x1c94,A4)=>DAT_00224bd8,D0
00220546 29 40 1b f8     move.l     D0,(0x1bf8,A4)=>DAT_00224b3c

I can open a separate issue for that if needed.

ghidra1 commented 12 months ago

have auto analysis detect and automatically mark instructions as "split"

Do to the frequent occurance of bad offcut flows we are currently not considering any generalized attempt at performing such fixups automatically.

if the user manually does it, automatically disassemble the bytes (70 00 here) rather than leaving as data.

At this point in time the action is intended for more surgical manipulation and we do not want to tie auto-disassembly to the action. We are considering the intoduction of a script (which could be attached to a key-binding) which will attempt all modifications for an instruction which contains an offcut flow to produce overlapping instructions.

NationalSecurityAgency / ghidra

Auto analysis/detection of jumps into the middle of another instruction #5928