Vector35 / binaryninja-api

Public API, examples, documentation and issues for Binary Ninja
https://binary.ninja/
MIT License
897 stars 200 forks source link

rep movsd instruction does not round-trip in disassembler/assembler #4037

Open ejm9 opened 4 years ago

ejm9 commented 4 years ago

x86 assembler does not seem to be working for 'rep mov' instructions...code below demonstrates the issue

from binaryninja import Architecture
arch = Architecture['x86']

instr_bytes = b'\xf3\xa5'    # rep movsd dword [edi], [esi]
txt, size = arch.get_instruction_text(instr_bytes, 0)
instr = ''.join(str(a) for a in txt).strip()

# now 'instr' holds:   "rep movsd dword [edi], [esi]"

assem = arch.assemble(instr)

arch.assemble fails with: ValueError: Could not assemble: b': error: warnings being treated as errors\ninput:1: error: invalid combination of opcode and operands'

Note: it also fails if I do:

disassem =  bv.get_disassembly_at(addr)
bv.arch.assemble(disassem)

It also seems to fail on movsb byte [edi], [esi] type instructions. Thanks for checking this out...

ejm9 commented 4 years ago

Fix typo: Note: it also fails if I do:

disassem = bv.get_disassembly(addr)
bv.arch.assemble(disassem)
jeffli678 commented 4 years ago

I will look into this.

ejm9 commented 4 years ago

OK, I see now that the proper syntax expected by bv.arch.assemble() is "rep movsd", even though bv.get_disassembly(addr) produces the output: "rep movsd dword [edi], [esi]".

I guess its not really a bug if the expected syntax of the disassembly instruction is different between bv.arch.assemble() and bv.get_disassembly()....but it would be nice if the api docs describing bv.arch.assemble() said that it expects Keystone disassembly syntax (I believe keystone is what Binja is using for assembling) which is different than the disassembly format displayed in the binja binary view....

psifertex commented 4 years ago

It's still a bug, just a small one. We prefer our disassembly to be round-tripable unless we are explicitly not happy with the assembler's syntax. (For x86/x64, we use Yasm if you have future questions about syntax)

jeffli678 commented 4 years ago

Not entirely sure about this. I like the current disassembly output rep movsd dword [edi], [esi]. So we might wish to tweak the assembly input before sending it to Yasm?

enedil commented 4 years ago

Another example: f30f1efa nop edx, edi

psifertex commented 4 years ago

That encoding (endbr64) is properly handled if you're on the latest dev already @enedil.

psifertex commented 4 years ago

Oh, I should clarify -- the disassembler has support for endbr64 but yasm doesn't yet.