danielplohmann / smda

SMDA is a minimalist recursive disassembler library that is optimized for accurate Control Flow Graph (CFG) recovery from memory dumps.
BSD 2-Clause "Simplified" License
227 stars 36 forks source link

SMDA incorrectly maps sections/segments from ELF files #23

Closed williballenthin closed 3 years ago

williballenthin commented 3 years ago

The variable min_raw_segment_offset used https://github.com/danielplohmann/smda/blob/bb9cdab2279f255f5516750160c3a1ff424f8882/smda/utility/ElfFileLoader.py#L61 and https://github.com/danielplohmann/smda/blob/bb9cdab2279f255f5516750160c3a1ff424f8882/smda/utility/ElfFileLoader.py#L77 in the ElfFileLoader are used incorrectly, leading to sections/segments being truncated.

This variable contains a virtual address; however, it is used as a raw file offset (the raw file offset of the first section/segment). When the virtual address is something like 0x401000, then the assignment tries to copy 0x401000 from the source binary, ends up with much less data (the size of the file amount of data), and this truncates the mapped data to len(size of file). Subsequent assignments to the mapped data somehow don't throw exceptions despite failing to write beyond the end of the truncated mapping.

A reasonable fix is to operate on physical/raw file offsets rather than virtual addresses.

williballenthin commented 3 years ago

I'll wrap this fix into my PR for #22.

williballenthin commented 3 years ago

image pending code, needs a bit more documentation, likely to be submitted tomorrow.

williballenthin commented 3 years ago

incidentally, should we be using both sections and segments, rather than just one or the other? maybe its good enough as is. certainly i'll open an issue if i find this breaking in the real world.

danielplohmann commented 3 years ago

Oh, that's a good catch! I must have overlooked that when I previously added/fixed the simulated loading of the ELF files. At least I didn't notice such significant cases of code truncation when running SMDA against my groundtruth in the evaluation. The results were within the expected margin that I didn't even consider that the loading was bugged that much. I don't have an understanding of ELF files deep enough to propose a solid solution as my previous attempt at loading showed. :) The first segments - then sections sounds really good though and appears to fix the problems you encountered!