albertan017 / LLM4Decompile

Reverse Engineering: Decompiling Binary Code with Large Language Models
MIT License
2.92k stars 209 forks source link

will there for assembly soon? #21

Open mbrmaker opened 1 month ago

mbrmaker commented 1 month ago

I would like to know if it will soon be possible to convert a .bin (from NASM) to .asm source code with your AI.

albertan017 commented 1 month ago

Thanks for your interest. However, our work exclusively concentrates on decompilation on top of asm. Future projects will also proceed on this basis, utilizing assembly code that has been disassembled by tools like objdump or other disassemblers.

mbrmaker commented 1 month ago

but is it really possible?

albertan017 commented 1 month ago

Compared to decompiling, disassembling is a more established field. Therefore, we only focus on decompilation. The current LLM4Decompile has demonstrated impressive performance on the HumanEval benchmark and shows promise with real data. Our objective is to enhance its practical application using methods such as scaling up training, retrieval-augmented generation, and additional techniques.

For obfuscated or protected binaries, we do not consider language models like LLM to be an effective solution because the costs associated with training typically outweigh the benefits derived from decompiling or disassembling the code, considering the wide range of possible obfuscation techniques.