Vector35 / sigkit

Function signature matching and signature generation plugin for Binary Ninja
https://binary.ninja/2020/03/11/signature-libraries.html
MIT License
59 stars 10 forks source link

Merge multiple .sig files #28

Open Luca1991 opened 1 week ago

Luca1991 commented 1 week ago

I'm writing a script that, given a directory with .LIB files, will extract all the .obj files and automatically create a .sig file for each of them. The idea is to be able to easily produce signatures for various SDKs (like the old MSVC++6.0 and so on), automatically.

Now the question is: how can I merge all the individual .sig files obtained from each .obj into a single file (e.g. msvcpp6.sig)? I see that there is a script called merge_multiple_versions.py in the example directory, but it looks like its puropose is to "merge the signature libraries generated for different versions of the same library".

Once I figure this out, I'll release the script :)

stong commented 1 week ago

Hi there,

It's been several years since I worked on this (and I am no longer an employee of Vector35, and I am no longer a professional reverse engineer) but from my memory, I believe this may be helpful to you:

Essentially, since you are working with an entire msvcrt sdk, you also need to do some form of linking. This is because the various functions in the static libraries (obj files) are going to get linked with each other at compile time when the actual program binary that uses the library objects is produced. For example, if CompilationUnit1.obj references function in CompilationUnit2.obj, this will be like a relocation or something in the .obj, but in an actual .exe, the reference will get turned into a real function call. To account for this difference between .obj and what we see in actual binaries, we need to do linking when we're working with .lib and .obj .a and .o respectively on Linux) like this for signature generation.

The script linked above is basically this but for .a and .o on Linux for ubuntu binaries, so the overall principle is the same for Windows .lib and .obj.

Let me know how it goes!

stong commented 1 week ago

Also you probably want to enable guess relocations when generating the signatures, like this:

Luca1991 commented 6 days ago

I think I'm spending too much time on this and I'm not even sure I'm on the right track because of the "linking" problem mentioned ealier by @stong (not to mention the poor performance I'm having extracting each .obj file from the .LIB).

Let's try another approach...

@stong If I use this script: https://github.com/Vector35/sigkit/blob/master/examples/batch_process.py, the code in the function process_bv(bv) should it fix the linking problem? If so, I think my best bet is to write an external tool to dump all the obj files from the LIBs and then process them with an updated version of this script (for example, I don't think 'PDB\Load (BETA)' exists anymore). Then I'll try to adapt merge_multiple_versions.py to merge all the individual signatures into a single signature file.

I don't know if that makes sense...

Also, should I really merge ALL MSVCPP6.0 signatures into a single .sig file or should I split them up by category? (if so, which one?)