eliben / pyelftools

Parsing ELF and DWARF in Python
Other
2.03k stars 511 forks source link

Add Optional Symbol Cache for Fast Symbol Access in ELF Files #577

Open Sababoni opened 6 days ago

Sababoni commented 6 days ago

I would like to propose adding an optional symbol caching feature to pyelftools that allows users to quickly access symbols and Debugging Information Entries (DIEs) from ELF files. This would be particularly useful for users working with large ELF files or those who need frequent access to specific symbols.

Currently, accessing DWARF information requires iterating through Compilation Units (CUs) and DIEs, which can be time-consuming, especially for ELF files containing thousands of symbols or extensive debugging data. By implementing an optional caching mechanism, pyelftools could drastically improve performance for certain use cases.

Feature Details:

Caching Symbol Information:

Add an optional parameter (index_symbols=True) to ELFFile that, when enabled, builds an in-memory cache of all relevant symbols and their corresponding DIEs. The cache could be implemented as a dictionary ({symbol_name: DIE_info}) to enable constant-time (O(1)) lookups of symbols such as functions, variables, and types. Opt-In for Flexibility:

This feature would be opt-in to provide flexibility and maintain backward compatibility. By default, pyelftools would behave as it currently does, iterating through each CU and DIE on demand, without caching. Usage Example:

When opening an ELF file, users could enable the symbol cache: python Copy code from elftools.elf.elffile import ELFFile

with open('large.elf', 'rb') as f: elf = ELFFile(f, index_symbols=True)

die = elf.get_symbol('main') if die: print(f"Found symbol 'main' at address: 0x{die.attributes['DW_AT_low_pc'].value:x}") else: print("Symbol 'main' not found.") This API would allow users to easily access symbols after the initial caching step, making repeated queries much more efficient.

sevaa commented 6 days ago

By symbols, do you mean exported symbols?