gimli-rs / gimli

A library for reading and writing the DWARF debugging format
https://docs.rs/gimli/
Apache License 2.0
853 stars 108 forks source link

Write example #707

Closed Cr0a3 closed 7 months ago

Cr0a3 commented 8 months ago

Hi, I wanted to ask how I can write a object file with debugging symbols trough this and the object libary. I saw in your docs on the write tab (https://docs.rs/gimli/latest/gimli/write/index.html) a simple example, but i don't understand how i add debugging information and write all of that into one object file. A quick google search also reveals not much.

Bye

philipc commented 8 months ago

You can refer to https://github.com/rust-lang/rustc_codegen_cranelift/blob/master/src/debuginfo/ for a usage example.

Cr0a3 commented 8 months ago

I don't understand the code

philipc commented 8 months ago

For the gimli side of things, I can't give any better example than what is at https://docs.rs/gimli/latest/gimli/write/index.html. This library only implements the DWARF file format. It doesn't cover the semantics of the DWARF information, so if that part doesn't make sense to you then you need to learn more about DWARF, and there are plenty of resources on the internet for that.

To write the DWARF sections to a file using object, you need to change this part:

    // Create a `Vec` for each DWARF section.
    let mut sections = Sections::new(EndianVec::new(gimli::LittleEndian));
    // Finally, write the DWARF data to the sections.
    dwarf.write(&mut sections)?;
    sections.for_each(|id, data| {
        // Here you can add the data to the output object file.
        Ok(())
    })

First, instead of using EndianVec you need a writer to record the relocations. That's done here using WriterRelocate, and the important part of WriterRelocate is the Writer implementation which records the relocations.

After that, you need fill in the sections.for_each to add the sections and relocations to the object file. You already know how to add sections and relocations to object files. You can obtain the section name with id.name().

Cr0a3 commented 8 months ago

Thank you for your response

I don't really understand how the dwarf data is formated. I found out that there are multiple debugging sections. One of them is debug_info. The debugging information is stored in a format which is called DWARF.

But what i don't understand is how i can map machine code to code data.

E.g: i want to map the machine code 0xC3 to the Code return; How can i do that?

Bye

bjorn3 commented 8 months ago

That would be line debuginfo. You will have to create a DwarfUnit (let's call it dwarf) with the root having DW_AT_low_pc and DW_AT_ranges indicating the full range of all functions to which this debuginfo applies and then dwarf.unit.line_program has a line program which maps from bytes in the code to source locations. For that you first do line_program.begin_sequence with the start address of the function and then for every location in the machine code you fill in the fields of line_program.row() (address_offset with the offset from the start of the function. file with the source file, line with the line in the source file, ...) and then call line_program.generate_row(). You need to do this in ascending order. And at the end you call line_program.end_sequence() with the size of the function.

See for example https://github.com/rust-lang/rustc_codegen_cranelift/blob/fbda869b4e230c788b6bce426038ba8419956f2d/src/debuginfo/line_info.rs#L137-L163

Cr0a3 commented 8 months ago

@bjorn3 Thanks a lot

So would this code generate a dwarf information which says in file test.asm in line 1; coloumn 0 at the function data index 3;

// Begins sequence
dwarf.unit.line_program.begin_sequence(Some(5));

// Sets  all files
dwarf.unit.line_program.row().file = "test.asm";
dwarf.unit.line_program.row().line = 1;
dwarf.unit.line_program.row().column = 0;
dwarf.unit.line_program.row().adress_offset = 3;

dwarf.unit.line_program.generate_row(Some(5));

// Ends sequence
dwarf.unit.line_program.end_sequence();

Is there the width of what this debugging symbol discribes 1 byte big or a specific value (when specific value: how can i set it)?

How can i set the code this debugging symbol is refering to?

bjorn3 commented 8 months ago

Is there the width of what this debugging symbol discribes 1 byte big or a specific value (when specific value: how can i set it)?

It is from the given address_offset until the next row (or the end of the function if there is no row after it).

dwarf.unit.line_program.generate_row(Some(5));

This function doesn't accept any arguments.

dwarf.unit.line_program.end_sequence();

And this one accepts the total length of the function.

Cr0a3 commented 8 months ago

Thanks, how can i say which code is related to that row? @bjorn3

Cr0a3 commented 8 months ago

I want to try something like this:

use std::fs::File;

use gimli::{write::{Address, DwarfUnit, EndianVec, Error, LineString, Sections}, Encoding};

use object::{
    write::{Object, StandardSection, Symbol, SymbolSection}, Architecture, BinaryFormat, Endianness, SymbolFlags, SymbolKind, SymbolScope
};

fn main() -> Result<(), Box<dyn std::error::Error>> {
    let mut dwarf = DwarfUnit::new(Encoding {
        format: gimli::Format::Dwarf32,
        version: 5,
        address_size: 8,
    });

    let mut obj = Object::new(BinaryFormat::Coff, Architecture::X86_64, Endianness::Little);

    let directory_id = dwarf.unit.line_program.add_directory(LineString::String(b"./".into()));
    let file_id = dwarf.unit.line_program.add_file(LineString::String(b"test.abc".into()), directory_id, None);

    let data = vec![
        0xff // bad
    ];

    let (section, offset) =
        obj.add_subsection(StandardSection::Text, b"main", &data, 1);
    obj.add_symbol(Symbol {
        name: b"main".into(),
        value: offset,
        size: data.len() as u64,
        kind: SymbolKind::Text,
        scope: SymbolScope::Linkage,
        weak: false,
        section: SymbolSection::Section(section),
        flags: SymbolFlags::None,
    });

    let address = Some(Address::Constant(offset));

    dwarf.unit.line_program.begin_sequence(address);
    dwarf.unit.line_program.row().file = file_id;
    dwarf.unit.line_program.row().line = 2;
    dwarf.unit.line_program.row().column = 1;
    dwarf.unit.line_program.row().address_offset = 0x01;
    dwarf.unit.line_program.end_sequence(2);

    let mut sections = Sections::new(EndianVec::new(gimli::LittleEndian));
    dwarf.write(&mut sections)?;
    sections.for_each(|id, data| {
        let data = data.clone().into_vec();
        let section = obj.add_section(vec![], id.name().as_bytes().into(), object::SectionKind::Debug);
        let sec = obj.section_mut(section);
        sec.append_data(&data, 1);

        Ok::<(), Error>(())
    })?;

    let file = File::create("test.o")?;
    obj.write_stream(file)?;

    Ok(())

}

But sadly it don't work. I dosn't read test.abc and relocates the debug symbol to my main functions 0xff byte.

Bye

bjorn3 commented 8 months ago

Looks like you are missing dwarf.unit.line_program.generate_row() before the end_sequence call. Also you need to make sure that address is a relocation pointing to the main symbol, not a constant value. Linking will change the offset at which main is placed and a relocation is necessary for the linker to fixup the address stored in the debuginfo.

In addition you need to add the DW_AT_low_pc attribute with a value of 0 and DW_AT_ranges attribute referencing the ranges of all covered functions to the root of the unit for the debugger to even consider looking at the associated line program.

Cr0a3 commented 8 months ago

@bjorn3 Thanks, How can i make it a relocation to my symbol?

In addition you need to add the DW_AT_low_pc attribute with a value of 0 and DW_AT_ranges attribute referencing the ranges of all covered functions to the root of the unit for the debugger to even consider looking at the associated line program.

Where do i need to add them?

bjorn3 commented 8 months ago

Where do i need to add them?

For DW_AT_low_pc see https://github.com/rust-lang/rustc_codegen_cranelift/blob/0328ee571bbe3a1d21b5c01e816cf4224193a2fc/src/debuginfo/mod.rs#L131 (and https://github.com/rust-lang/rustc_codegen_cranelift/blob/0328ee571bbe3a1d21b5c01e816cf4224193a2fc/src/debuginfo/mod.rs#L119-L120 for the root variable) For DW_AT_ranges see https://github.com/rust-lang/rustc_codegen_cranelift/blob/0328ee571bbe3a1d21b5c01e816cf4224193a2fc/src/debuginfo/emit.rs#L26-L29 where self.unit_range_list is extended every time a function is added at https://github.com/rust-lang/rustc_codegen_cranelift/blob/0328ee571bbe3a1d21b5c01e816cf4224193a2fc/src/debuginfo/mod.rs#L326-L329.

How can i make it a relocation to my symbol?

In cg_clif this is implemented in https://github.com/rust-lang/rustc_codegen_cranelift/blob/master/src/debuginfo/emit.rs and https://github.com/rust-lang/rustc_codegen_cranelift/blob/master/src/debuginfo/object.rs together.

Cr0a3 commented 8 months ago

Thx

Cr0a3 commented 8 months ago

@bjorn3 So I now have this code:

use std::fs::File;

use gimli::{write::{Address, AttributeValue, DwarfUnit, EndianVec, Error, LineString, Sections}, Encoding};

use object::{
    write::{Object, Relocation, StandardSection, Symbol, SymbolId, SymbolSection}, Architecture, BinaryFormat, Endianness, SymbolFlags, SymbolKind, SymbolScope
};

fn address_for_func(obj: &Object, id: &SymbolId) -> Address {
    let sym = obj.symbol(id.to_owned());
    Address::Symbol { symbol: sym.value as usize, addend: 0 }
}

fn main() -> Result<(), Box<dyn std::error::Error>> {
    let mut dwarf = DwarfUnit::new(Encoding {
        format: gimli::Format::Dwarf32,
        version: 5,
        address_size: 8,
    });

    let root = dwarf.unit.get_mut(dwarf.unit.root());
    root.set(gimli::DW_AT_low_pc, AttributeValue::Address(Address::Constant(0)));
    root.set(gimli::DW_AT_ranges, AttributeValue::RangeListRef(unit_range_list_id));

    let mut obj = Object::new(BinaryFormat::Coff, Architecture::X86_64, Endianness::Little);

    let directory_id = dwarf.unit.line_program.add_directory(LineString::String(b"./".into()));
    let file_id = dwarf.unit.line_program.add_file(LineString::String(b"test.abc".into()), directory_id, None);

    let data = vec![
        0xff // bad
    ];

    let (section, offset) =
        obj.add_subsection(StandardSection::Text, b"main", &data, 1);
    let sym = obj.add_symbol(Symbol {
        name: b"main".into(),
        value: offset,
        size: data.len() as u64,
        kind: SymbolKind::Text,
        scope: SymbolScope::Linkage,
        weak: false,
        section: SymbolSection::Section(section),
        flags: SymbolFlags::None,
    });

    let address = Some(address_for_func(&obj, &sym));

    dwarf.unit.line_program.begin_sequence(address);
    dwarf.unit.line_program.row().file = file_id;
    dwarf.unit.line_program.row().line = 2;
    dwarf.unit.line_program.row().column = 1;
    dwarf.unit.line_program.row().address_offset = 0x01;
    dwarf.unit.line_program.generate_row();
    dwarf.unit.line_program.end_sequence(2);

    obj.add_relocation(
        sect,
        Relocation {
            offset: reloc.offset,
            symbol,
            kind: reloc.kind,
            encoding: RelocationEncoding::Generic,
            size: reloc.size * 8,
            addend: i64::try_from(symbol_offset).unwrap() + reloc.addend,
            flags: todo!(),
        },
    );

    let mut sections = Sections::new(EndianVec::new(gimli::LittleEndian));
    dwarf.write(&mut sections)?;
    sections.for_each(|id, data| {
        let data = data.clone().into_vec();
        let section = obj.add_section(vec![], id.name().as_bytes().into(), object::SectionKind::Debug);
        let sec = obj.section_mut(section);
        sec.append_data(&data, 1);

        Ok::<(), Error>(())
    })?;

    let file = File::create("test.o")?;
    obj.write_stream(file)?;

    Ok(())

}

But i have a few errors and questions:

bjorn3 commented 8 months ago

How can I define one of these unit_range_list_ids?

let unit_range_list_id = self.dwarf.unit.ranges.add(unit_range_list.clone());

philipc commented 7 months ago

709 adds a write example that creates a complete object file, plus some helpers to make relocation handling easier.

Cr0a3 commented 7 months ago

Cool, then i can close this issue now