Open XinyuShe opened 6 months ago
No, currently we only consider a single function.
Gathering data and developing a workable approach for decompiling complex files with multiple functions and structures is quite demanding. Therefore, this initial version of LLM4Decompile is limited to decompilation of individual functions.
Addressing the complexities posed by external functions and struct definitions is a primary focus of our future decompilation efforts. Our team is actively working on strategies to address these issues. While the nature of the problem maybe ill-posed, a larger and more varied training dataset will allow the model to make statistical guesses about the potential functions and types that correspond to the missing pieces. We'll report the results asap!
@albertan017 Thanks for your reply! I am also wondering where did you find those c file datasets without structs and long function?
@albertan017 Thanks for your reply! I am also wondering where did you find those c file datasets without structs and long function?
We remove those parts in Anghabench for simplification. The original dataset is available here. But the dataset is only compilable, not linkable. Therefore, we are looking for other benchmarks and collecting our own data.
Do you take
struct
into consideration? And how do you handle the issue of excessively long functions in assembly code?