gimli-rs / gimli

A library for reading and writing the DWARF debugging format
https://docs.rs/gimli/
Apache License 2.0
831 stars 105 forks source link

UnwindContext allocation cannot be reused across multiple unwinds with changing data lifetimes #701

Closed mstange closed 5 months ago

mstange commented 6 months ago

The docs for UnwindContext say:

"To avoid re-allocating the context multiple times when evaluating multiple CFI programs, it can be reused."

However, reusing the context can be tricky: If your unwind data is stored in a slice with a limited lifetime, and you use EndianSlice<'mylifetime> as the Reader, then the unwind context cannot outlive that lifetime.

Example program (does not compile):

use gimli::{BaseAddresses, EhFrame, EndianSlice, NativeEndian, UnwindContext, UnwindSection};
use std::collections::HashMap;

#[derive(Default)]
struct Unwinder {
    eh_frame_data_for_module: HashMap<String, Vec<u8>>,
    unwind_context: UnwindContext<EndianSlice<'static, NativeEndian>>,
}

impl Unwinder {
    fn add_module(&mut self, module_name: impl Into<String>, eh_frame_data: Vec<u8>) {
        self.eh_frame_data_for_module
            .insert(module_name.into(), eh_frame_data);
    }
    fn remove_module(&mut self, module_name: &str) {
        self.eh_frame_data_for_module.remove(module_name);
    }
    fn unwind_in_module(&mut self, module_name: &str, address: u64) {
        let eh_frame_data = self.eh_frame_data_for_module.get(module_name).unwrap();
        let eh_frame = EhFrame::new(&eh_frame_data, NativeEndian);
        let bases = BaseAddresses::default();
        let _unwind_info = eh_frame
            .unwind_info_for_address(
                &bases,
                &mut self.unwind_context, // ERROR: can't be used due to eh_frame_data's lifetime
                address,
                EhFrame::cie_from_offset,
            )
            .unwrap();
    }
}

fn main() {
    let mut unwinder = Unwinder::default();
    unwinder.add_module("lib1.so", vec![0, 1, 2, 3]);
    unwinder.unwind_in_module("lib1.so", 0x1234);
    unwinder.add_module("lib2.so", vec![0, 1, 2, 3]);
    unwinder.unwind_in_module("lib2.so", 0x5678);
}

The two remedies I know of are:

  1. If you do not own the data, then you have to create a new UnwindContext every time. Ideally you have some scope inside which you can do as many batched unwinds as possible, so that you can at least reuse the context to some degree.
  2. If you have (shared or full) ownership over the data, then you can create an EndianRcSlice or your own Rc-based Reader implementation which makes the data live as long as necessary.

With the Rc solution, the UnwindContext may keep your data alive for longer than you anticipated because the context is not reset at the end of each use - it's only reset at the start of the next use.

Furthermore, if you're writing an unwinding library which wraps gimli and you allow the user to provide custom OwnedData types, and your library uses EndianReader to create a custom Reader implementation which wraps the user-provided OwnedData type, then you have to take extra care to make sure that the type you give to EndianReader implements StableDeref correctly (your type must guarantee that it derefs to the same address every time even if the OwnedData type does not, or alternatively you have to put a StableDeref trait bound on your library's OwnedData trait).


It would be great if we could find a way to reuse the allocations inside UnwindContext in such a way that the same allocation can be used with EndianSlice readers of different lifetimes.

Could we add a reset_and_recycle method on UnwindContext?

    /// Tries to reuse the allocation if the layouts of `R2` and `R` match
    fn reset_and_recycle<R2: Reader>(self) -> UnwindContext<R2, A> {
        let mut ctx = UnwindContext {
            stack: self.stack.clear_and_recycle(),
            initial_rule: None,
            is_initialized: false,
        };
        ctx.reset();
        ctx
    }

Then the example above would become:

struct Unwinder {
    eh_frame_data_for_module: HashMap<String, Vec<u8>>,
    unwind_context: Option<UnwindContext<EndianSlice<'static, NativeEndian>>>, // initialized to Some()
}

impl Unwinder {
    // ...
    fn unwind_in_module(&mut self, module_name: &str, address: u64) {
        let static_context = self.unwind_context.take().unwrap();
        let eh_frame_data: &[u8] = self.eh_frame_data_for_module.get(module_name).unwrap();
        let mut lifetime_limited_context = static_context.reset_and_recycle::<EndianSlice<_, NativeEndian>>();
        let eh_frame = EhFrame::new(&eh_frame_data, NativeEndian);
        let bases = BaseAddresses::default();
        let _unwind_info = eh_frame
            .unwind_info_for_address(
                &bases,
                &mut lifetime_limited_context,
                address,
                EhFrame::cie_from_offset,
            )
            .unwrap();
        let static_context = static_context.reset_and_recycle::<EndianSlice<'static, NativeEndian>>();
        self.unwind_context = Some(static_context);
    }
}
philipc commented 6 months ago

Another idea would be to stop storing R in CallFrameInstruction and RegisterRule, and instead store the offset and length of the expression.