esaulenka / ghidra_v850

Ghidra support for Renesas V850 MCUs
MIT License
66 stars 26 forks source link

Undefined instruction a3 07 4f 40 00 00 #11

Closed wrongbaud closed 4 years ago

wrongbaud commented 4 years ago

When disassembling, the module seems to fail at the following bytecodes:

a3 07 4F 40 00 00

This instruction does not seem to be present in any architecture specification that I've seen online - could this be a proprietary instruction or are we simply missing an architecture document somewhere?

It is close to the LD.BU and LD.HU instructions but it does not line up perfectly with either.

It could be an LD.BU format 1, but it would seem to be storing the value in r0 which doesn't seem like it should be possible according to the architecture document.

esaulenka commented 4 years ago

Can you share sample binary? Also, do you know exactly which MCU used in your device?

It is close to the LD.BU

it also looks like broken PREPARE instruction

wrongbaud commented 4 years ago

Attached is a sample function that has the unidentified sequence - it is from an R7F701A352 MCU

We originally thought that it may be a conditional LD.BU but can't see references to that in any documentation online!

function.zip

esaulenka commented 4 years ago

Oh, it is a modern RH850 core. According to wiki, there is a number of differences from documented 850E2 core. I am also dont have any documentation. If you can find new commands (for example, gcc knows this core), I will try add support for it.

UPD. I checked your file. Looks like there is junk in first 0x0C bytes. Correct function starts from prepare instruction. Alse, first instruction shifts stack pointer to 12 bytes, and there is no corresponding instruction, which should restore it.

wrongbaud commented 4 years ago

Interesting - because that location is jumped to by a JARL instruction, so I do think it's supposed to be valid code, unless we have issues with out binary file. Do you know what might have caused this? It doesn't seem like something that would be a compiler optimization but I'm not very familiar with this architecture

jbmokuz commented 4 years ago

Esulenka, thank you for the response. It does seem that things line up for that one particular page if we push everything back 0xC bytes!

We do, however, still get a few strange instructions, even if they are in a seemly good looking function. function2.bin.zip At offset 0x2C it looks like it should be an 'ld.hu 0x25A[r15], r0'. If we change the ld.hu definition to use r0 as reg2 with something like below.

# LD.HU disp16[reg1], reg2
:ld.hu disp16[R0004], xR1115 is op0510=0x3F & R0004 & xR1115; op1616=1 & s1731
[ disp16 = s1731 << 1; ] {
        local addr:4 = disp16 + R0004;
        xR1115 = zext(*:2 addr);
}

Furthermore IDA says that this instruction is an ld.hu! How ever the spec does not allow for r0 to be used as reg2! We have a few other examples of strange instructions like this (also see offset 0x3A). It doesn't seem like we should be able to load anything into r0. Perhaps this is something else? Is this possibly an instruction for a modern RH850 core?

Thank you so much!

Just to compare

IDA preview

esaulenka commented 4 years ago

for example, gcc knows this core

Oh... You can find here very, very interesting file v850-opc.c.

As far as I can figure out, your a3 07 ... is a { "st.dw", two (0x07a0, 0x000f), two (0xffe0, 0x001f), {R3_EVEN, D23_ALIGN1, R1}, 3, PROCESSOR_V850E3V5_UP },

Next step is to find out, how they encoded instruction operands...

jbmokuz commented 4 years ago

Oh perfect! I'll look though v850-opc.c and check to see if we can figure some of these instructions out after the weekend. Thank you so much!

wrongbaud commented 4 years ago

Just to update this - I was able to get an appropriate version of binutils built in Docker and managed to get the following result:

root@a691b6c1b684:/usr/local/src/binutils-2.24/binutils# ./objdump -m v850e3v5 -b binary -D /sources/function.bin

/sources/function.bin:     file format binary

Disassembly of section .data:

00000000 <.data>:
   0:   54 1a           add     -12, sp
   2:   63 3f 01 00     st.w    r7, 0[sp]
   6:   a3 07 4f 40     st.dw   r8, 4[sp]
   a:   00 00
   c:   be 07 21 08     prepare {r20, lp}, 31
  10:   23 1e 78 ff     movea   -136, sp, sp
  14:   63 37 01 01     st.w    r6, 256[sp]
  18:   03 a0           mov     sp, r20
  1a:   20 46 00 01     movea   256, r0, r8
  1e:   00 3a           mov     0, r7
  20:   14 30           mov     r20, r6
  22:   88 ff e0 03     jarl    0x80402, lp
  26:   23 3f 01 01     ld.w    256[sp], r7
  2a:   23 46 0c 01     movea   268, sp, r8
  2e:   14 30           mov     r20, r6
  30:   88 ff bc 06     jarl    0x806ec, lp
  34:   0a 06 90 ff     addi    -112, r10, r0
  38:   b6 05           blt     0x3e
  3a:   43 07 70 00     st.b    r0, 112[sp]
  3e:   60 52           cmp     0, r10
  40:   d7 05           ble     0x4a
  42:   03 38           mov     sp, r7
  44:   00 32           mov     0, r6
  46:   bf ff 56 ff     jarl    0xffffff9c, lp
  4a:   23 1e 88 00     movea   136, sp, sp
  4e:   7e 06 20 08     dispose 31, {r20, lp}, r0

So it appears to be a st.dw r8, 4[sp] instruction

wrongbaud commented 4 years ago

Updating this again with more information that was found about the updated core, just for reference in case anyone takes this on in the future.

http://agentzh.org/misc/code/gdb/v850-tdep.c.html

esaulenka commented 4 years ago

Done! pull request https://github.com/esaulenka/ghidra_v850/pull/14

Thank you for making me do this. Feel free to report bugs!