API for acessing operands at disassembly level

ltlly commented 1 year ago

Is this a general concept that needs to be documented or a specific API? A general concept that needs to be documented

What concept or API needs more documentation? How to get assembly code and its properties like operand, operation and length quickly? How to quickly modify the assembly code at a given address? Just like using the 'Edit current line' function via UI ？

Detailed description I'm reversing a programme with a lot of dirty asm, and I'd like to use binaryninja's python api to match all the dirty assembly code of the same form and replace it.
Such as eq

I think it's easy to solve this problem ,but we encountered many problems.

The first problem is that I didn't find an easy to use api for reading assembly instructions, the only relevant one is through functions.instructions.It return

But it is not easy to use ,I have to use the array index and check if the element is of the type I want (e.g. InstructionTextToken).And I had to use keystone to assemble the returned token, and then use len() to get the length of this assembly instruction.This is so inelegant! But at the il level there are many properties that can be used, such as .operands and .operation.Why is it not available at the assembly level, am I not finding it?And In idapython, I can use idc.print_operand( ), GetDisam, print_insn_mnem and other functions directly get the operand or others.

The second problem is how to quickly modify the assembly code, in the UI I can use 'Edit current line' to quickly modify the assembly, which is a good feature, is there a python api for it?

Is this documentation related to a specific API? Python,

Are there any known examples of people using this API/concept? No,I can't find an example of this

xusheng6 commented 1 year ago

For patching code, you can use Architecture.assemble method. https://api.binary.ninja/binaryninja.architecture-module.html#binaryninja.architecture.Architecture.assemble. You can get the architecture of the binary view by bv.arch. Here is an example how I used the Python API to patch some obfuscated function calls. https://binary.ninja/2020/07/14/solving-an-obfuscated-crackme-with-binaryninja-and-triton.html#solving-obfuscated-calls. Please be aware that this is written 3 years ago and it scheduled to be updated, which means it may not represent the most effective/powerful ways to do things.

xusheng6 commented 1 year ago

One of the reason that we are not providing ways to access operands at assembly level is we believe the BNIL is a better way to offer structured access to the code. At the various level of ILs, you should be able to extract the info you wanted with ease.

xusheng6 commented 1 year ago

If you still prefer to work with the disassembly, there are some ways you can proceed:

The function.instructions list offers the address of each instruction, with which you should be able to calculate the length of the instruction.
There is a get_instruction_info method on the Architecture class, with which you can get the length of the instruction along with other information of the instruction.
You will need to do some text processing with the instruction text returned by that code as well. You can look at this chapter of my blog and see how I did it: https://binary.ninja/2020/07/14/solving-an-obfuscated-crackme-with-binaryninja-and-triton.html#solving-obfuscated-calls

xusheng6 commented 1 year ago

I have changed the title of the issue to reflect the updated status of this issue.

Also I suggest you hop onto our slack so you can chat with our devs and other users: https://slack.binary.ninja/

psifertex commented 1 year ago

Just to reiterate, LLIL represents a very close approximation to the native assembly and it has a mapping to the native address instructions. That is the recommended method for doing this. The reason we don't expose properties on assembly directly is it could potentially change for every architecture and making something architecture agnostic is part of the goal of LLIL.

Additionally, having architecture modules expose more structure would make more work to support architecture modules so not requiring it was an intentional design choice.

For now I'm going to close this as won't fix, but if there is a reason that your goal isn't possible on LLIL let us know and I'm happy to reconsider or help come up with a different solution.

ltlly commented 1 year ago

Thank you very much, I will try the method you suggested next!

Vector35 / binaryninja-api

API for acessing operands at disassembly level #4439