gimli-rs / gimli

A library for reading and writing the DWARF debugging format
https://docs.rs/gimli/
Apache License 2.0
831 stars 105 forks source link

Question: fuse units from Wasm files #697

Closed yowl closed 6 months ago

yowl commented 6 months ago

HI,

Is it possible to fuse 2 Units , i.e. when working with split dwarf (fusion) and UnitType::Skeleton ? I read both Units from the .wasm and the .dwp and want a single Unit that is the reverse of the split. Maybe I am being too simplistic?

philipc commented 6 months ago

I think that should be possible, but I'm not aware of any code that does it.

yowl commented 6 months ago

Thanks, I've been reading the spec at https://dwarfstd.org/doc/DWARF5.pdf and have a better idea of what's going on and how I can proceed. Given a DwarfPackage and a unit.dwo_id from the object wasm, is there an efficient way to find that unit in the DwarfPackage? I can iterate the debug_info.units() but for thousaands of compilation units, that is not going to be efficient.

philipc commented 6 months ago

debug_info.units() should be efficient enough, because it only needs to parse the header (and maybe first entry) of each unit. You can do one pass over that to build a hash map. This is what the dwarfdump example does.

Edit: this is doing something different, it's finding all the skeleton units in the parent, not the units in the package.

philipc commented 6 months ago

Oh, there's also the .debug_cu_index section, see https://docs.rs/gimli/latest/gimli/read/struct.DwarfPackage.html#method.find_cu

yowl commented 6 months ago

Thanks for the link to the example, I've gone with that with minor adjustment to use a DwarfPackage :

let dwo_parent_units = if let Some(dwarf_package) = di.dwarf_package {
    Some(
        match dwarf_package
            .debug_info
            .units()
            .map(|unit_header| di.dwarf.unit(unit_header))
            .filter_map(|unit| Ok(unit.dwo_id.map(|dwo_id| (dwo_id, unit))))
            .collect()
        {
            Ok(units) => units,
            Err(err) => {
                eprintln!("Failed to process --dwo-parent units: {}", err);
                return Err(err.into());
            }
        },
    )
} else {
    None
};

However I'm not sure I'm picking up the correct collect as I'm getting an error:

error[E0277]: a value of type `()` cannot be built from an iterator over elements of type `(gimli::DwoId, gimli::Unit<EndianSlice<'_, gimli::LittleEndian>, usize>)`
   --> crates\cranelift\src\debug\transform\mod.rs:106:18
    |
106 |                 .collect()
    |                  ^^^^^^^ value of type `()` cannot be built from `std::iter::Iterator<Item=(gimli::DwoId, gimli::Unit<EndianSlice<'_, gimli::LittleEndian>, usize>)>`
    |
    = help: the trait `FromIterator<(gimli::DwoId, gimli::Unit<EndianSlice<'_, gimli::LittleEndian>, usize>)>` is not implemented for `()`
    = help: the trait `FromIterator<()>` is implemented for `()`
    = help: for that trait implementation, expected `()`, found `(gimli::DwoId, gimli::Unit<EndianSlice<'_, gimli::LittleEndian>, usize>)`
note: the method call chain might not have had the expected associated types
   --> crates\cranelift\src\debug\transform\mod.rs:105:18
    |
101 |               match dwarf_package
    |  ___________________-
102 | |                 .debug_info
    | |___________________________- this expression has type `DebugInfo<EndianSlice<'_, LittleEndian>>`
103 |                   .units()
    |                    ------- `FallibleIterator::Item` is `UnitHeader<EndianSlice<'_, LittleEndian>, usize>` here
104 |                   .map(|unit_header| di.dwarf.unit(unit_header))
    |                    --------------------------------------------- `FallibleIterator::Item` changed to `Unit<EndianSlice<'_, LittleEndian>, usize>` here
105 |                   .filter_map(|unit| Ok(unit.dwo_id.map(|dwo_id| (dwo_id, unit))))
    |                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ `FallibleIterator::Item` changed to `(DwoId, Unit<EndianSlice<'_, LittleEndian>, usize>)` here
note: required by a bound in `fallible_iterator::FallibleIterator::collect`
   --> C:\Users\scott\.cargo\registry\src\index.crates.io-6f17d22bba15001f\fallible-iterator-0.3.0\src\lib.rs:424:12
    |
422 |     fn collect<T>(self) -> Result<T, Self::Error>
    |        ------- required by a bound in this associated function
423 |     where
424 |         T: iter::FromIterator<Self::Item>,
    |            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ required by this bound in `FallibleIterator::collect`

For more information about this error, try `rustc --explain E0277`.
warning: `wasmtime-cranelift` (lib) generated 1 warning
error: could not compile `wasmtime-cranelift` (lib) due to previous error; 1 warning emitted

I apologise for being a rust beginner, maybe this is obvious.

yowl commented 6 months ago

di is one of

pub struct DebugInfoData<'a> {
    pub dwarf: Dwarf<'a>,
    pub name_section: NameSection<'a>,
    pub wasm_file: WasmFileInfo,
    debug_loc: gimli::DebugLoc<Reader<'a>>,
    debug_loclists: gimli::DebugLocLists<Reader<'a>>,
    pub debug_ranges: gimli::DebugRanges<Reader<'a>>,
    pub debug_rnglists: gimli::DebugRngLists<Reader<'a>>,
    pub dwarf_package: Option<DwarfPackage<Reader<'a>>>,
}

I had a look at find_cu but it returns Result<Option<Dwarf>> and I couldn't see how to convert that to a Unit.

yowl commented 6 months ago

Ah, I was missing an & in front of di.dwarf_package, the error message was a clue but too subtle for me. Having added the explicit type to dwo_parent_units I got a different error message. Anyway I now have

let dwo_parent_units: Option<
    HashMap<DwoId, Unit<EndianSlice<'_, gimli::LittleEndian>, usize>>,
> = if let Some(dwarf_package) = &di.dwarf_package {
    Some(
        match dwarf_package
            .debug_info
            .units()
            .map(|unit_header| di.dwarf.unit(unit_header))
            .filter_map(|unit| Ok(unit.dwo_id.map(|dwo_id| (dwo_id, unit))))
            .collect()
        {
            Ok(units) => units,
            Err(err) => {
                eprintln!("Failed to process --dwo-parent units: {}", err);
                return Err(err.into());
            }
        },
    )
} else {
    None
};

which at least runs, however it errors with:

Failed to process --dwo-parent units: Hit the end of input before it was expected

As though it doesn't like my dwp file. 1.zip

philipc commented 6 months ago

I've haven't looked at the unit iteration but it's almost certainly not what you want. Use find_cu instead. It will give you a Dwarf which is the unit's contributions to all of the sections within the package, and the debug_info within that Dwarf will contain a single unit, which is what you are looking for. You can obtain it with dwarf.units().next().

yowl commented 6 months ago

Thanks a lot for the help. I know have something which appears to be working in wasmtime.

image