Open 0xBEEEF opened 6 years ago
We are already doing statically linked code detection and elimination. Although the matching algorithm is currently not using all the information available in our signatures (see #113).
The problem is, these functions differ between compilers and their versions. Therefore it is not possible for us to have signatures for all the compilers and version out there. RetDec comes with signatures for those compilers we use to generate testing binaries for all the supported architectures. These signatures will not be matched against binaries created by different compilers.
However, there is a mechanism that allows you to pass archives that your compiler uses to the decompilation process. See the --static-code-archive
option. It can be used multiple times to pass multiple archives. Function signatures will be extracted from these archives prior to the decompilation and if everything goes ok, they should be used to eliminate statically linked code.
For example, if your compiler takes functions from libc.a
library and links them to the binary, you would run the decompilation like this:
./retdec-decompiler.sh binary --static-code-archive /usr/lib32/libc.a
This should generate YARA file with signatures for your vesion of libc.a
. If you are interested, you can look inside to see, how such rules look. This YARA file does not have to be generated each time. You can use existing YARA files with --static-code-sigfile
option.
This is a fairly new feature and I'm not sure how many people are actually using it - it may not be well tested. So if you find some problems, e.g. signatures that are not matched, open a new issue and attach everything that is needed to reproduce the problem.
Thank you very much for the quick and above all detailed answer!
I am of course aware that you have to assume that there are a lot of combinations. Would there be any kind of database in which you can collect all the signatures of the methods? If I remember correctly, IDA only had one database with signatures and could easily recognize many of the static methods. It would be really great if there was a central knowledge base somewhere, which maintains the community, for example, and which is constantly growing. In that case, the number of possibilities is likely to decrease in the future. But now that I only work with my own examples, it is of course no problem to work with the method you suggest. Does this actually work with the Windows version? You have now written an example that seems to apply only to Linux.
Yes, mentioned options can handle COFF (Windows) format too, only the paths will be different. For more info check newly created wiki page.
One way of approaching this would be creating a script that would automatically generate signatures for standard and third-party libraries on the user's system and install the signatures somewhere into the share/retdec/support
directory. After that, the signatures would (or should) be automatically picked up by retdec-decompiler.sh
. In this way, each user would have signatures specifically tailored to his or her system.
Of course, this will not help if the decompiled binary contains statically linked code from a library that is not present in the user's system. For this, we would need to create and maintain a database of signatures for various compilers, libraries, and their versions. However, this database would be huge and possibly hard to download, install, and use.
@s3rvac Yes, that would be enough as a workaround. One could automate this to find all the libraries on the system. To my knowledge, IDA already maintains such a large database with this information. Isn't it easy to connect them and use them somehow? Then you wouldn't have to do this work all over again. After all, you don't have to reinvent the wheel.
@mbandzi By the way, thanks for the new wiki entry. This makes things much clearer, and one understands the interrelationships much better! Thank you for all the time you invest in this product!
We discussed it and came up with this design #224.
Would something like the https://godbolt.org/ project help with this? Allows generation of code via an immense selection of compilers; even if the current web ui doesn't work for this purpose, they might be able to help out.
@lokkju Very interesting project. I might use it sometimes for experiments I often do. However, I'm not sure if it would be any use in the case of statically linked code. We need to generate signatures from libraries these compilers use - we need to have all these compiler packages in order to do it. Moreover, we probably don't even want to prgenerate all the possible compilers, just make it easier and more reliable for users to generate and then use their own signatures.
@PeterMatula as my interested is more on the reversing side, I wasn't thinking as much about users generating their own signatures; but seeing the design in #244, there is no reason a user couldn't just offer repositories of signatures. Anyway, I'll be happy to see progress on this!
This is a very interesting project. It is a lot of fun and joy to experiment with the program. Especially if you work with examples from before and after. I only noticed the following:
When I create a small program, an incredible number of static functions are integrated and linked. With the GCC, for example, this is really a lot, and so a tiny program quickly turns into one with several megabytes. As a result, the result suffers, of course, or, as described in other issues, the memory load. But actually there are only a few classes and program parts, which I built myself to analyze.
Shouldn't the analysis phase be optimized? The standard functions would have to be filtered out, and the result would have to be significantly reduced. Is it planned to be made available at some point? Apart from that, there are already problems with the crazy little test programs with the GNU Compiler Collection, which even lead to the fact that you can't decompile your own program. I didn't really manage to analyze bigger programs > 1 MB, although they didn't really contain much in the core....