capstone-engine / capstone

Capstone disassembly/disassembler framework for ARM, ARM64 (ARMv8), Alpha, BPF, Ethereum VM, HPPA, LoongArch, M68K, M680X, Mips, MOS65XX, PPC, RISC-V(rv32G/rv64G), SH, Sparc, SystemZ, TMS320C64X, TriCore, Webassembly, XCore and X86.
http://www.capstone-engine.org
7.56k stars 1.55k forks source link

MIPS Groups incorrect for JALR, BLTZAL #370

Closed zachriggle closed 2 years ago

zachriggle commented 9 years ago

Surely jalr deserves CS_GRP_JUMP (or even CS_GRP_CALL)?

>>> i.bytes, i.mnemonic, i.op_str, i.groups
(bytearray(b'\t\xf8 \x03'), 'jalr', '$t9', [137, 144])
zachriggle commented 9 years ago

It looks like this may be a general thing with MIPS, as most (all?) branches are missing these groups.

>>> i.bytes, i.mnemonic, i.op_str, i.groups
(bytearray(b'\x01\x00\x10\x05'), 'bltzal', '$t0, 0x767cb8c8', [137, 146, 147])
hlide commented 9 years ago

yeah they are CS_GRP_CALL. In most cases, jalr rd, rsuses $ra for rd and so rd is also omitted to give jalr rs. As for bltzaland consoeurs, they can be seen as conditional calls.

aquynh commented 9 years ago

Zach, can you also post the hardware mode of the engine, so i can test this?

zachriggle commented 9 years ago

Should have just been CS_ARCH_MIPS, CS_MODE_32, with cs.detail=True, on the next branch.

aquynh commented 9 years ago

can you list all the instructions with this problem here, so i can fix them all in one go?

thanks.

hlide commented 9 years ago

technically, all instructions ending with AL :-)

zachriggle commented 9 years ago

Yeah, everything *al for "and link" should be a CS_GRP_CALL type.

Has any infrastructure been put in place for MIPS condition codes?

zachriggle commented 9 years ago

Also, the specific sequence jr $ra should probably have CS_GRP_RET.

>>> from capstone import *
>>> cs=Cs(CS_ARCH_MIPS, CS_MODE_32)
>>> cs.detail=True
>>> for i in cs.disasm('\x08\x00\xe0\x03',0):
..:     print(i.bytes, i.mnemonic, i.op_str, i.groups)
..:     
(bytearray(b'\x08\x00\xe0\x03'), u'jr', u'$ra', [137, 1])
zachriggle commented 9 years ago

To clarify, I think that these should also all have CS_GRP_JMP. While it's nice when CS_GRP_XXX are mutually exclusive, I don't think they should be in this case.

hlide commented 9 years ago

CS_GRP_JUMP is not enough in my opinion we should add MIPS specifics like:

J/JR/JAL/JALR must have CS_GRP_BRANCH_DELAY_SLOT.

EDIT:

that way, CS_GROUP_JUMP, CS_GROUP_CALL and CS_GROUP_RET can be mutually exclusive as higher generic semantics. If we need more specific details about a jump or call instruction:

hlide commented 9 years ago

Note that those specifics should have equivalents for Sparc too since it has similar slots after a branch instruction.

hlide commented 9 years ago

Argh, those are enumerated values, not from a bitset :(

zachriggle commented 9 years ago

I disagree that there should be a top level variant for that. Arch-level maybe. Delay slot is not an instruction group in either case. On Mon, Jul 13, 2015 at 8:26 AM hlide notifications@github.com wrote:

Argh, those are enumerated values, not from a bitset :(

— Reply to this email directly or view it on GitHub https://github.com/aquynh/capstone/issues/370#issuecomment-120909105.

hlide commented 9 years ago

Whatever, those flags are missing. Trying to detect by their instruction id is a nightmare (I'm trying to add delayslot handling in decompiler snowman for MIPS). Not only those id are not unique per opcode (for instance, MIPS_INSN_ADD encodes add,add.s and add.d) but I must also to handle branch instruction aliases ineptie. I wish to be able to refactorize the code to handle the delay slot by simply having those flags. Sorry to say, capstone as a decomposer is a bit lacking.

aquynh commented 9 years ago

@hlide, anything prevents you from refactoring the code to handle the missing features?

i will gather all the issues and look at them when i have more time. some issues can be fixed by us, and some can be reported to upstream.

LLVM community (mostly Imgtec guys) are working hard on this Mips disassembler, so we can be optimistic (and i am, always).

thanks.

hlide commented 9 years ago

I can do the delay slot handling using instruction id but that's not very orthogonal (MIPS is, capstone is not) and not elegant (switch with a lot of cases).