avast / retdec

RetDec is a retargetable machine-code decompiler based on LLVM.
https://retdec.com/
MIT License
8.02k stars 945 forks source link

MIPS Extending supported instructions #393

Open MagnificentS opened 6 years ago

MagnificentS commented 6 years ago

How do I extend the current instruction set in mips?

PeterMatula commented 6 years ago

Take a look at capstone2llvmir library, MIPS module in this case. To support more instructions, their semantics needs to be modeled in LLVM IR = write a routine that takes a Capstone instruction and generates LLVM IR sequence with the same semantics. Also see this discussion. We do not want to have semantics for all the possible instructions, only those that can be represented in C reasonably well (simple and clear code). At the moment, all unhandled instructions are ignored, but we are already working on implementation that will generate pseudo asm calls for them.

What kind of instructions are you interested in? Would pseudo asm calls be enough for you, or do you want complete semantics?

MagnificentS commented 6 years ago

I was thinking of PS2 Mips. I could have sworn I saw a paper by you guys using ps2dev ( A homebrew kit) build and decompile custom elf

s3rvac commented 6 years ago

You are right. For creating MIPS ELF binaries via GCC, we were using Minimalist PSP homebrew SDK. However, we were interested only in regular MIPS instructions and not in PS2-specific extensions.

MagnificentS commented 6 years ago

My memory is a bit muddled lol. Thanks

PeterMatula commented 6 years ago

Here is the relevant map of instructions that we translate, and those we don't. I'm sure there are many that are simple enough to reasonably represent in LLVM IR and C. As I wrote above, all the others should not be ignored but pseudo asm intrinsic calls should be generated. If you want, you can add semantics for more instructions and send a pull request. But as I also wrote, we are not interested in semantics for complex instructions, so we will not accept PR that would go into implementing such instructions. We can discuss specific instructions (families) that could be added here. And if you are up to it, you can add them, or I can look into it when I will have the time.

MagnificentS commented 6 years ago

Wow thank you so much. I was going to start in a few weeks but this might help me get started sooner

PeterMatula commented 6 years ago

I just merged branch solving #115 and written some info about the translation process on our wiki: https://github.com/avast-tl/retdec/wiki/Capstone2LlvmIr. The most important change is that unhandled assembly instructions are not ignored anymore. Calls to assembly pseudo functions are auto-generated based on info provided by Capstone. This does not have to be 100% precise, but it is better than nothing. If you have a sample that contain such instructions, try to decompile it with the current master and compare it with the old output. It should be better now, but if you encounter some problems, please report it. I would be very interested how it works on real binaries other than x86 - I did not test it on MIPS much.

nihilus commented 5 years ago

I see that you handle unaligned stores (swr/swl) but not unaligned loads (ldl/ldr). It is pretty common in code and would definitely break the decompilation. Any thought on fixing it with an instrincis or something like that?

nihilus commented 5 years ago

Feel free to check this as it is based on Capstone as well: https://github.com/nihilus/snowman/tree/master/src/nc/arch/mips

nihilus commented 5 years ago

For binaries to test against use https://github.com/nihilus/snowman-tests

PeterMatula commented 5 years ago

Well, even swr/swl are translated using translatePseudoAsmFncOp0Op1() at the moment. Which is generating a generic pseudo call, which is not ideal. Since loads and stores are pretty important, I will add these instructions (swr/swl, ldl/ldr) to my todo list and look into it - try to write proper translation routines for them if possible.

Thanks for the links, I will look at them.

PeterMatula commented 5 years ago

I went over that Snowman MIPS translation and added semantics for few more MIPS instructions that we were missing. Now we should have pretty much everything that is in Snowman.

However, RetDec translates unaligned load/store instructions (MIPS_INS_LWL, MIPS_INS_LWR, MIPS_INS_SWL, MIPS_INS_SWR, MIPS_INS_LDL, MIPS_INS_LDR, MIPS_INS_SDL, MIPS_INS_SDR) using pseudo function calls (intrinsic-like functions) at the moment. Snowman has full semantics for these. I looked into it and:

Contributions extending supported instructions are welcomed, but keep in mind that we don't want to model instructions that are too complicated and their LLVM IR representation would be too complex.

nihilus commented 5 years ago

It would be good to translate the pairs to a pseudo-function like "ulw" / "usw" for clarity and then implement that to start with.