Open ceesb opened 4 years ago
Hi @ceesb, thanks for trying ddisasm! It will be much easier for me to diagnose the problems if I have access to the binaries. Email will work.
-no-pie
? I think in Ubuntu18 gcc generates PIE by default but clang does not.Thanks for the reply!
Indeed issue 2 disappears with -no-pie
. When I add -no-pie
for issue 3 I'm left with undefined refs. I'll mail you the bins for issue 1 and 3.
Hi @ceesb, issue 1 is a performance issue. It is a combination of a relatively large data section with the fact that there are few data accesses detected. That Datalog rule that you pointed out is behaving very badly. Fortunately, I have a fix almost ready! I will let you know once it makes its way to master.
I can't reproduce issue 3 with the last ddisasm version (commit 39c696e), is it possible that it got fixed? Let me know if that is not the case.
Alright, https://github.com/GrammaTech/ddisasm/commit/07793370907bbc41df2d3c223683ad3b22e9ec81 should fix issue 1!
Nice! Indeed I see issue 1 is solved. Can you tell me what was the key difference between the gcc binary that caused the hang and the clang binary that didn't?
Issue 3 is still open for 07793370907bbc41df2d3c223683ad3b22e9ec81 as pasted below; maybe I'm linking to an older dependency when building ddisasm?
$ ddisasm --version
1.0.0 (0779337 2020-06-01)
$ ddisasm --self-diagnose --ir retargeted_wbc_clang_relocs.gtirb --asm retargeted_wbc_clang_relocs.s wbc_clang_relocs
Building the initial gtirb representation (25ms)
Decoding the binary (1s)
Disassembling (21s)
Populating gtirb representation (86s)
Computing intra-procedural SCCs (1ms)
Computing no return analysis (195ms)
Detecting additional functions (808ms)
Printing assembler (39s)
Perfoming self diagnose (this will only give the right results if the target program contains all the relocation information)
Self diagnose completed: No errors found
$ gcc -o retargeted_wbc_clang_relocs retargeted_wbc_clang_relocs.s -no-pie
/tmp/cc2zMVWX.o: In function `AES_encrypt':
(.text+0x147): undefined reference to `_disambig_94828702669104'
(.text+0x158): undefined reference to `_disambig_94828702669104'
(.text+0x191): undefined reference to `_disambig_94828702669104'
(.text+0x1a2): undefined reference to `_disambig_94828702669104'
(.text+0x1d2): undefined reference to `_disambig_94828702669104'
/tmp/cc2zMVWX.o:(.text+0x1e3): more undefined references to `_disambig_94828702669104' follow
collect2: error: ld returned 1 exit status
Regarding the difference between gcc and clang. It has to do with the specific patterns in which data is accessed. You can get an idea, if you generate assembly files with the --debug
option. That will print assembly with extra annotations (this assembly won't be reassembleable but it is useful for understanding). In the gcc binary, ddisasm detects a few access patterns (e.g. at address L_3600:
or at .L_1df40
). These patterns are propagated through the data section (see annotations preferred_data_access
). Because the data section is big and the accesses are sparse, the propagation was taking a long time (the way this is propagated now changed to do it more efficiently). In the clang binary, none of these accesses are detected, so no propagation takes place.
As for why no accesses are detected, it probably has to do with how addresses are computed in the clang binary. For what I can tell, the address computation that is going on to access these tables in the data section is quite intricate and even the data accesses for the gcc binary are inaccurate (though that does not necessarily break the assembly).
Regarding issue 3, I finally managed to reproduce it using the master version of LIEF. We tend to stick to stable releases, that is why I couldn't reproduce it. Nonetheless, I believe this is still a ddisasm bug. I will look into it further.
I'm experimenting with C code that has a large data segment and relatively little code. I'm happy to share the binaries with you, but I prefer not to post them publicly. If you want them, send me a ping, and I can email them. Hope this helps!
Issue 1, for gcc binaries (-O2 or not) ddisasm hangs
If I compile with gcc (Ubuntu 7.5.0-3ubuntu1~18.04), ddisasm will hang in the disassembling phase (with or without "-O2"). When I interrupt I see this:
Issue 2, for clang -O2 ddisasm produces a broken asm
If I compile with clang (6.0.0-1ubuntu2) and "-O2", ddisasm finishes but the assembly will not compile.
Issue 3, for clang -O2 with relocations ddisasm finds no errors
On the clang with "-O2" binary for which ddisasm produces broken asm, I tried to run ddisasm with --self-diagnose, but it reports everything is fine. Attempting to compile the resulting assembly still leads to an error.
Clang without explicit optimization works!
If I compile with clang (6.0.0-1ubuntu2) and omit "-O2, all is well!
Ddisasm version is git commit 15f97c87a423b5e7fe69f863f6804ce3dc5724bb.