avast / retdec

RetDec is a retargetable machine-code decompiler based on LLVM.
https://retdec.com/
MIT License
8.05k stars 952 forks source link

"Error: Decompilation to LLVM IR failed" During Conditional branch optimization #552

Open TheElementalOfDestruction opened 5 years ago

TheElementalOfDestruction commented 5 years ago

Command Line Output:

C:\[REDACTED]>py retdec-decompiler.py --no-memory-limit C:\[REDACTED]
##### Checking if file is a Mach-O Universal static library...

##### Checking if file is an archive...
RUN: C:\[REDACTED]\retdec-ar-extractor C:\[REDACTED] --arch-magic
Not an archive, going to the next step.

##### Gathering file information...
RUN: C:\[REDACTED]\retdec-fileinfo -c C:\[REDACTED] --similarity C:\[REDACTED] --no-hashes=all --crypto C:\[REDACTED]
Input file               : C:\[REDACTED]
File format              : ELF
File class               : 32-bit
File type                : DLL
Architecture             : ARM
Endianness               : Little endian
Detected tool            : gold (1.11) (linker), .note section heuristic
Detected tool            : GCC (4.9) (compiler), .comment section heuristic
Detected tool            : GCC (4.8) (compiler), .comment section heuristic
Detected tool            : GCC (4.4.3) (compiler), .comment section heuristic
Original language        : C++

##### Trying to unpack C:\[REDACTED] into C:\[REDACTED]-unpacked.tmp by using generic unpacker...
RUN: C:\[REDACTED]\retdec-unpacker C:\[REDACTED] -o C:\[REDACTED]-unpacked.tmp
No matching plugins found for 'gold 1.11'
No matching plugins found for 'GCC 4.9'
No matching plugins found for 'GCC 4.8'
No matching plugins found for 'GCC 4.4.3'
##### Unpacking by using generic unpacker: nothing to do
##### 'upx' not available: nothing to do

##### Decompiling C:\[REDACTED] into C:\[REDACTED].c.backend.bc...
RUN: C:\[REDACTED]\retdec-bin2llvmir -provider-init -decoder -verify -main-detection -idioms-libgcc -inst-opt -register -cond-branch-opt -syscalls -stack -constants -param-return -local-vars -inst-opt -simple-types -generate-dsm -remove-asm-instrs -class-hierarchy -select-fncs -unreachable-funcs -inst-opt -value-protect -instcombine -tbaa -targetlibinfo -basicaa -domtree -simplifycfg -domtree -early-cse -lower-expect -targetlibinfo -tbaa -basicaa -globalopt -mem2reg -instcombine -simplifycfg -basiccg -domtree -early-cse -lazy-value-info -jump-threading -correlated-propagation -simplifycfg -instcombine -simplifycfg -reassociate -domtree -loops -loop-simplify -lcssa -loop-rotate -licm -lcssa -instcombine -scalar-evolution -loop-simplifycfg -loop-simplify -aa -loop-accesses -loop-load-elim -lcssa -indvars -loop-idiom -loop-deletion -memdep -gvn -memdep -sccp -instcombine -lazy-value-info -jump-threading -correlated-propagation -domtree -memdep -dse -dce -bdce -adce -die -simplifycfg -instcombine -strip-dead-prototypes -globaldce -constmerge -constprop -instnamer -domtree -instcombine -instcombine -tbaa -targetlibinfo -basicaa -domtree -simplifycfg -domtree -early-cse -lower-expect -targetlibinfo -tbaa -basicaa -globalopt -mem2reg -instcombine -simplifycfg -basiccg -domtree -early-cse -lazy-value-info -jump-threading -correlated-propagation -simplifycfg -instcombine -simplifycfg -reassociate -domtree -loops -loop-simplify -lcssa -loop-rotate -licm -lcssa -instcombine -scalar-evolution -loop-simplifycfg -loop-simplify -aa -loop-accesses -loop-load-elim -lcssa -indvars -loop-idiom -loop-deletion -memdep -gvn -memdep -sccp -instcombine -lazy-value-info -jump-threading -correlated-propagation -domtree -memdep -dse -dce -bdce -adce -die -simplifycfg -instcombine -strip-dead-prototypes -globaldce -constmerge -constprop -instnamer -domtree -instcombine -simple-types -stack-ptr-op-remove -inst-opt -idioms -global-to-local -dead-global-assign -instcombine -phi2seq -value-protect -disable-inlining -disable-simplify-libcalls -config-path C:\[REDACTED].c.json -o C:\[REDACTED].c.backend.bc
Running phase: Initialization ( 0.00s )
Running phase: LLVM ( 0.01s )
Running phase: Providers initialization ( 0.01s )
Running phase: Input binary to LLVM IR decoding ( 1.08s )
Running phase: LLVM ( 258.01s )
Running phase: Main function identification optimization ( 260.14s )
Running phase: Libgcc idioms optimization ( 260.17s )
Running phase: LLVM instruction optimization ( 261.18s )
Running phase: Assembly register optimization ( 262.11s )
Running phase: Conditional branch optimization ( 262.11s )
Wrote crash dump file "C:\[REDACTED]\AppData\Local\Temp\retdec-bin2llvmir.exe-2c8ddd.dmp"
0x00007FF9F17A9149 (0x00007FF6D97B3E77 0x000000EB1FF6E038 0x0000000000000100 0x00007FF9F52DD997), RaiseException() + 0x69 bytes(s)
0x00007FF9EEE33361 (0x000002E300000000 0x000000EB1FF6F200 0x0000000000000000 0x0000000000000000), _is_exception_typeof() + 0x1081 bytes(s)
0x00007FF9F5373BD6 (0x000002F69519B9B0 0x000002F69512A750 0x000002E476544B20 0x000002E4765298A0), RtlCaptureContext() + 0x566 bytes(s)
0x00007FF6D9025911 (0x000002E4764CDD40 0x000002F694F66D80 0x000002E4D8948CE0 0x000002E476544A80)
0x00007FF6D902A309 (0x000002E30CE934A8 0x000002E4739B8770 0x0000000000000000 0x000002F2F62742B0)
0x00007FF6D902B4D8 (0x000002F2F6298188 0x000002E464B6B2A0 0x000002E478349C98 0x000002E464B6AD60)
0x00007FF6D902BA5C (0x0000000000000000 0x0000000000000000 0x000002E3055B7FB0 0x000000EB1FF6F810)
0x00007FF6D900972F (0x00007FF6D9C87DC0 0x0000000000000000 0x0000000000000001 0x0000000000000000)
0x00007FF6D96F1F6F (0x000002E300000011 0x000002E305652300 0x0000000000000000 0x0000000000000000)
0x00007FF6D96F1660 (0x000002E306E73A10 0x000000EB1FF6FA50 0x000002E3055B63F0 0x000002E306E73A10)
0x00007FF6D8F67B2D (0x0000000000000404 0x00007FF6D9793339 0x00007FF9F244B570 0x0000000000000000)
0x00007FF6D8F6AEB2 (0x0000000000000000 0x0000000000000000 0x0000000000000000 0x0000000000000000)
0x00007FF6D9793AFD (0x0000000000000000 0x0000000000000000 0x0000000000000000 0x0000000000000000)
0x00007FF9F25881F4 (0x0000000000000000 0x0000000000000000 0x0000000000000000 0x0000000000000000)
0x00007FF9F533A251 (0x0000000000000000 0x0000000000000000 0x0000000000000000 0x0000000000000000), RtlUserThreadStart() + 0x21 bytes(s)
Error: Decompilation to LLVM IR failed

My dump file: retdec-bin2llvmir.exe-2c8ddd.dmp.zip

s3rvac commented 5 years ago

Thank you for the report. Would it be possible to provide the input file that you are trying to decompile so we can try to reproduce the issue? Either publicly here or privately via email.

TheElementalOfDestruction commented 5 years ago

What email should I send it to?

s3rvac commented 5 years ago

You can send it to e.g. the email that is in my profile.

s3rvac commented 5 years ago

Thank you for the input binary. I have tried decompiling it with the current master build and the decompilation has successfully passed through Conditional branch optimization. However, it has consumed around 140 GB of RAM. @PeterMatula, could you please verify what is taking that much memory?

s3rvac commented 5 years ago

Here is an output from valgrind --tool=massif (heap profiler), taken during the Conditional branch optimization. It can be opened in e.g. massif-visualizer. Below is also a screenshot from the tool:

massif-visualizer

@PeterMatula please take a look on the output. There are some data structures that take a lot of memory.

TheElementalOfDestruction commented 5 years ago

@s3rvac Would it be possible for you to send me the final output? I can't get my machine to accept using that much memory even through paging files.

s3rvac commented 5 years ago

Unfortunately, I do not have it (AFAIK I have not run the whole decompilation; I have only checked that the decompilation can pass the Conditional branch optimization phase). Let's wait until @PeterMatula analyzes the memory requirements from my report above.

TheElementalOfDestruction commented 5 years ago

So I finally managed to get one of my machines to accept a 200 gb paging file and have now run the program. Unfortunately, it's been stuck on the conditional branch part for the past 16 hours. I'm going to let it keep running overnight and see what happens, but now I'm concerned that it may not ever finish...

s3rvac commented 5 years ago

I really suggest waiting until @PeterMatula analyzes the memory requirements. A decompilation taking that much time won't probably produce any meaningful results.

TheElementalOfDestruction commented 5 years ago

@s3rvac So I've stopped that decompilation after a few days of it remaining at that step. I then started modifying the command to not do certain things. First I removed Conditional branch optimization. Then the program started using massive amounts of memory (and then took so long that I stopped it manually) on Stack optimization. I've now also taken that out just to see what the rest of the process is like in terms of memory and time. Perhaps the excessive memory usage at later steps is caused by Conditional branch optimization and Stack optimization not running, but they are also using massive amounts of memory and time. Here is the current output from he console (I ended up forgetting to stop it and just looked at it now):

C:\Users\Elemental Creation\Desktop\install\bin>"C:\Users\Elemental Creation\Desktop\install\bin\retdec-bin2llvmir" -provider-init -decoder -verify -x87-fpu -main-detection -idioms-libgcc -inst-opt -syscalls -constants -param-return -local-vars -inst-opt -simple-types -generate-dsm -remove-asm-instrs -class-hierarchy -select-fncs -unreachable-funcs -inst-opt -x86-addr-spaces -value-protect -instcombine -tbaa -targetlibinfo -basicaa -domtree -simplifycfg -domtree -early-cse -lower-expect -targetlibinfo -tbaa -basicaa -globalopt -mem2reg -instcombine -simplifycfg -basiccg -domtree -early-cse -lazy-value-info -jump-threading -correlated-propagation -simplifycfg -instcombine -simplifycfg -reassociate -domtree -loops -loop-simplify -lcssa -loop-rotate -licm -lcssa -instcombine -scalar-evolution -loop-simplifycfg -loop-simplify -aa -loop-accesses -loop-load-elim -lcssa -indvars -loop-idiom -loop-deletion -memdep -gvn -memdep -sccp -instcombine -lazy-value-info -jump-threading -correlated-propagation -domtree -memdep -dse -dce -bdce -adce -die -simplifycfg -instcombine -strip-dead-prototypes -globaldce -constmerge -constprop -instnamer -domtree -instcombine -instcombine -tbaa -targetlibinfo -basicaa -domtree -simplifycfg -domtree -early-cse -lower-expect -targetlibinfo -tbaa -basicaa -globalopt -mem2reg -instcombine -simplifycfg -basiccg -domtree -early-cse -lazy-value-info -jump-threading -correlated-propagation -simplifycfg -instcombine -simplifycfg -reassociate -domtree -loops -loop-simplify -lcssa -loop-rotate -licm -lcssa -instcombine -scalar-evolution -loop-simplifycfg -loop-simplify -aa -loop-accesses -loop-load-elim -lcssa -indvars -loop-idiom -loop-deletion -memdep -gvn -memdep -sccp -instcombine -lazy-value-info -jump-threading -correlated-propagation -domtree -memdep -dse -dce -bdce -adce -die -simplifycfg -instcombine -strip-dead-prototypes -globaldce -constmerge -constprop -instnamer -domtree -instcombine -inst-opt -simple-types -stack-ptr-op-remove -idioms -global-to-local -dead-global-assign -instcombine -inst-opt -idioms -phi2seq -value-protect -disable-inlining -disable-simplify-libcalls -config-path "C:\Users\Elemental Creation\Desktop\libchoicesapp.so.json" -o "C:\Users\Elemental Creation\Desktop\libchoicesapp.so.bc"
Running phase: Initialization ( 0.40s )
Running phase: LLVM ( 0.62s )
Running phase: Providers initialization ( 0.62s )
Running phase: Input binary to LLVM IR decoding ( 7.02s )
Running phase: LLVM ( 636.25s )
Running phase: x87 fpu register analysis ( 642.74s )
Running phase: Main function identification optimization ( 642.74s )
Running phase: Libgcc idioms optimization ( 642.81s )
Running phase: LLVM instruction optimization ( 644.59s )
Running phase: Syscalls optimization ( 647.71s )
Running phase: Constants optimization ( 648.64s )
Running phase: Function parameters and returns optimization ( 50079.44s )
Running phase: Register localization optimization ( 150145.98s )

(EDIT: Quick note: I did the math and 50079 seconds is about 14 hours and 150145 seconds is about 42 hours. Therfore, Function parameters and returns optimization took nearly 28 hours.)

Perhaps @PeterMatula should look into this as well?

Also, @s3rvac, in other decompiled files I have gotten a JSON file that has a huge amount of information, most of which is information I would want to look through for this specific file. Is there a way I can just get the JSON file without having to go through all the hassle that is occurring during the execution of retdec-bin2llvmir ?

s3rvac commented 5 years ago

Is there a way I can just get the JSON file without having to go through all the hassle that is occurring during the execution of retdec-bin2llvmir?

Unfortunately, there is not. The JSON file is a configuration file that tools which are part of the RetDec toolchain use to communicate with each other. The reason is that representing some pieces of information via LLVM IR is really cumbersome, which is why we have decided to use a separate config file.

TheElementalOfDestruction commented 5 years ago

@s3rvac ah I was hoping I might be able to see some of the class hierarchies in this one like I have in other projects. Being able to do so would be extremely useful for what I am trying to do.