Open MagnificentS opened 6 years ago
Take a look at capstone2llvmir library, MIPS module in this case. To support more instructions, their semantics needs to be modeled in LLVM IR = write a routine that takes a Capstone instruction and generates LLVM IR sequence with the same semantics. Also see this discussion. We do not want to have semantics for all the possible instructions, only those that can be represented in C reasonably well (simple and clear code). At the moment, all unhandled instructions are ignored, but we are already working on implementation that will generate pseudo asm calls for them.
What kind of instructions are you interested in? Would pseudo asm calls be enough for you, or do you want complete semantics?
I was thinking of PS2 Mips. I could have sworn I saw a paper by you guys using ps2dev ( A homebrew kit) build and decompile custom elf
You are right. For creating MIPS ELF binaries via GCC, we were using Minimalist PSP homebrew SDK. However, we were interested only in regular MIPS instructions and not in PS2-specific extensions.
My memory is a bit muddled lol. Thanks
Here is the relevant map of instructions that we translate, and those we don't. I'm sure there are many that are simple enough to reasonably represent in LLVM IR and C. As I wrote above, all the others should not be ignored but pseudo asm intrinsic calls should be generated. If you want, you can add semantics for more instructions and send a pull request. But as I also wrote, we are not interested in semantics for complex instructions, so we will not accept PR that would go into implementing such instructions. We can discuss specific instructions (families) that could be added here. And if you are up to it, you can add them, or I can look into it when I will have the time.
Wow thank you so much. I was going to start in a few weeks but this might help me get started sooner
I just merged branch solving #115 and written some info about the translation process on our wiki: https://github.com/avast-tl/retdec/wiki/Capstone2LlvmIr.
The most important change is that unhandled assembly instructions are not ignored anymore. Calls to assembly pseudo functions are auto-generated based on info provided by Capstone. This does not have to be 100% precise, but it is better than nothing. If you have a sample that contain such instructions, try to decompile it with the current master
and compare it with the old output. It should be better now, but if you encounter some problems, please report it. I would be very interested how it works on real binaries other than x86 - I did not test it on MIPS much.
I see that you handle unaligned stores (swr/swl) but not unaligned loads (ldl/ldr). It is pretty common in code and would definitely break the decompilation. Any thought on fixing it with an instrincis or something like that?
Feel free to check this as it is based on Capstone as well: https://github.com/nihilus/snowman/tree/master/src/nc/arch/mips
For binaries to test against use https://github.com/nihilus/snowman-tests
Well, even swr/swl are translated using translatePseudoAsmFncOp0Op1()
at the moment. Which is generating a generic pseudo call, which is not ideal. Since loads and stores are pretty important, I will add these instructions (swr/swl, ldl/ldr) to my todo list and look into it - try to write proper translation routines for them if possible.
Thanks for the links, I will look at them.
I went over that Snowman MIPS translation and added semantics for few more MIPS instructions that we were missing. Now we should have pretty much everything that is in Snowman.
However, RetDec translates unaligned load/store instructions (MIPS_INS_LWL, MIPS_INS_LWR, MIPS_INS_SWL, MIPS_INS_SWR, MIPS_INS_LDL, MIPS_INS_LDR, MIPS_INS_SDL, MIPS_INS_SDR
) using pseudo function calls (intrinsic-like functions) at the moment. Snowman has full semantics for these. I looked into it and:
Contributions extending supported instructions are welcomed, but keep in mind that we don't want to model instructions that are too complicated and their LLVM IR representation would be too complex.
It would be good to translate the pairs to a pseudo-function like "ulw" / "usw" for clarity and then implement that to start with.
How do I extend the current instruction set in mips?