gimli-rs / object

A unified interface for reading and writing object file formats
https://docs.rs/object/
Apache License 2.0
658 stars 149 forks source link

Add MiniDebugInfo #695

Closed Evian-Zhang closed 3 months ago

Evian-Zhang commented 4 months ago

Some ELF files are shipped with MiniDebugInfo for minimal symbols to backtrace. Some symbols reside in this section. Debuggers/static analyzers like GDB, LLDB, IDA support this feature.

It is easy to get the symbols from this section (need to add a dependency lzma-rs = "0.3"):

if let Some(minidebug_section) = obj_file.section_by_name(".gnu_debugdata") {
    let data = minidebug_section.data()?;
    if !data.is_empty() {
        let mut output = vec![];
        lzma_rs::xz_decompress(&mut Cursor::new(data), &mut output)?;
        let minidebug_object = object::File::parse(&*output)?;
        // symbols can be found in minidebug_object.symbols() and minidebug_object.dynamic_symbols()
    }
}

The MiniDebugInfo resides in a section called ".gnu_debugdata", and is compressed by LZMA XZ. After decompressing it, the content is a ELF file which contains meaningful symbols for debug.

I was going to write a PR, but I found lost in numerous structs and traits definitions and didn't know which should I put this implementation in. So I write this issue to put the implementation and hope you could add this method in an appropriate place. :)

philipc commented 4 months ago

Can you first convince me that this crate needs to do anything. The code snippet you give contains significant memory management decisions that cannot be made by this crate. minidebug_object references output, and so any function we add cannot return minidebug_object.

What are you trying to achieve?

Evian-Zhang commented 3 months ago

@philipc Thank you for your reply :)

Can you first convince me that this crate needs to do anything.

Since the object crate has provided methods including gnu_hash, gnu_debuglink, gnu_debugaltlink, etc., it is consistent to add a method for .gnu_debugdata section. Moreover, at least for me, object crate is the best library for symbol extraction for binaries. MiniDebugInfo is a standard symbol section, which GDB, LLDB, IDA, etc. have provided support. It is great if object crate supports this as well.

so any function we add cannot return minidebug_object.

I would propose we have two methods: decompressed_gnu_debugdata which returns a Vec<u8> which contains decompressed data of .gnu_debugdata; minidebug_object, which takes reference of previous decompressed data as parameter, and returns a Elf whose lifetime is bound to that data, thus solves the lifetime problem. Users could call symbols() and dynamic_symbols() to the returned Elf struct to get the embedded symbols in the MiniDebugInfo.

philipc commented 3 months ago

It is great if object crate supports this as well.

Why? It's adding complexity to object for something that you can already do in a handful of lines, as you have shown. What's the downside to you adding that code to your crate instead?

Since the object crate has provided methods including gnu_hash, gnu_debuglink, gnu_debugaltlink, etc., it is consistent to add a method for .gnu_debugdata section.

Those sections are a bit different because extra code is required to parse their contents, but for gnu_debugdata that code already exists: decompress and call Object::parse.

minidebug_object, which takes reference of previous decompressed data as parameter, and returns a Elf whose lifetime is bound to that data

This method would simply be a call to Object::parse, so it wouldn't add any value.

Evian-Zhang commented 3 months ago

Oh, I got it. Thank you for your patience:)