Smithay / smithay

A smithy for rusty wayland compositors
MIT License
1.9k stars 168 forks source link

Unsafe drop order of `GbmDevice` and `GbmBufferedSurface` #1102

Open PolyMeilex opened 1 year ago

PolyMeilex commented 1 year ago

This order of drop causes segfault:

    gbm: gbm::GbmDevice<DrmDeviceFd>,
    surfaces: HashMap<crtc::Handle, GbmBufferedSurface<GbmAllocator<DrmDeviceFd>, ()>>,

This one works fine:

    surfaces: HashMap<crtc::Handle, GbmBufferedSurface<GbmAllocator<DrmDeviceFd>, ()>>,
    gbm: gbm::GbmDevice<DrmDeviceFd>,

Backtrace of said segfault

#0  0x00007fd67b4b7510 in  ()
#1  0x00007fd67dd08d88 in gbm_dri_bo_destroy () at /lib64/libgbm.so.1
#2  0x000055da0563e52f in gbm::buffer_object::{impl#6}::new::{closure#0}<()> (ptr=0x55da06bdc7e0) at /home/poly/.cargo/registry/src/index.crates.io-6f17d22bba15001f/gbm-0.12.0/src/buffer_object.rs:493
#3  0x000055da05654218 in core::ops::function::FnOnce::call_once<gbm::buffer_object::{impl#6}::new::{closure_env#0}<()>, (*mut gbm_sys::gbm_bo)> () at /rustc/eb26296b556cef10fb713a38f3d16b9886080f26/library/core/src/ops/function.rs:250
#4  0x000055da0574c3f7 in alloc::boxed::{impl#47}::call_once<(*mut gbm_sys::gbm_bo), (dyn core::ops::function::FnOnce<(*mut gbm_sys::gbm_bo), Output=()> + core::marker::Send), alloc::alloc::Global> (self=..., args=...)
    at /rustc/eb26296b556cef10fb713a38f3d16b9886080f26/library/alloc/src/boxed.rs:1985
#5  0x000055da057faa54 in gbm::{impl#0}::drop<gbm_sys::gbm_bo> (self=0x55da06bc7930) at /home/poly/.cargo/registry/src/index.crates.io-6f17d22bba15001f/gbm-0.12.0/src/lib.rs:133
#6  0x000055da057f60e7 in core::ptr::drop_in_place<gbm::PtrDrop<gbm_sys::gbm_bo>> () at /rustc/eb26296b556cef10fb713a38f3d16b9886080f26/library/core/src/ptr/mod.rs:497
#7  0x000055da057497c0 in alloc::sync::Arc<gbm::PtrDrop<gbm_sys::gbm_bo>>::drop_slow<gbm::PtrDrop<gbm_sys::gbm_bo>> (self=0x55da06bdc6f0) at /rustc/eb26296b556cef10fb713a38f3d16b9886080f26/library/alloc/src/sync.rs:1263
#8  0x000055da057fbfa2 in alloc::sync::{impl#27}::drop<gbm::PtrDrop<gbm_sys::gbm_bo>> (self=0x55da06bdc6f0) at /rustc/eb26296b556cef10fb713a38f3d16b9886080f26/library/alloc/src/sync.rs:1899
#9  0x000055da057f913b in core::ptr::drop_in_place<alloc::sync::Arc<gbm::PtrDrop<gbm_sys::gbm_bo>>> () at /rustc/eb26296b556cef10fb713a38f3d16b9886080f26/library/core/src/ptr/mod.rs:497
#10 0x000055da057f5d5b in core::ptr::drop_in_place<gbm::Ptr<gbm_sys::gbm_bo>> () at /rustc/eb26296b556cef10fb713a38f3d16b9886080f26/library/core/src/ptr/mod.rs:497
#11 0x000055da057f7927 in core::ptr::drop_in_place<gbm::buffer_object::BufferObject<()>> () at /rustc/eb26296b556cef10fb713a38f3d16b9886080f26/library/core/src/ptr/mod.rs:497
#12 0x000055da057fa196 in core::ptr::drop_in_place<core::option::Option<gbm::buffer_object::BufferObject<()>>> () at /rustc/eb26296b556cef10fb713a38f3d16b9886080f26/library/core/src/ptr/mod.rs:497
#13 0x000055da05655ef7 in core::ptr::drop_in_place<smithay::backend::allocator::swapchain::InternalSlot<gbm::buffer_object::BufferObject<()>>> () at /rustc/eb26296b556cef10fb713a38f3d16b9886080f26/library/core/src/ptr/mod.rs:497
#14 0x000055da056853df in alloc::sync::Arc<smithay::backend::allocator::swapchain::InternalSlot<gbm::buffer_object::BufferObject<()>>>::drop_slow<smithay::backend::allocator::swapchain::InternalSlot<gbm::buffer_object::BufferObject<()>>>
    (self=0x55da06d36628) at /rustc/eb26296b556cef10fb713a38f3d16b9886080f26/library/alloc/src/sync.rs:1263
#15 0x000055da0565efc1 in alloc::sync::{impl#27}::drop<smithay::backend::allocator::swapchain::InternalSlot<gbm::buffer_object::BufferObject<()>>> (self=0x55da06d36628)
    at /rustc/eb26296b556cef10fb713a38f3d16b9886080f26/library/alloc/src/sync.rs:1899
#16 0x000055da0565689a in core::ptr::drop_in_place<alloc::sync::Arc<smithay::backend::allocator::swapchain::InternalSlot<gbm::buffer_object::BufferObject<()>>>> ()
    at /rustc/eb26296b556cef10fb713a38f3d16b9886080f26/library/core/src/ptr/mod.rs:497
#17 0x000055da05656e18 in core::ptr::drop_in_place<[alloc::sync::Arc<smithay::backend::allocator::swapchain::InternalSlot<gbm::buffer_object::BufferObject<()>>>; 4]> ()
    at /rustc/eb26296b556cef10fb713a38f3d16b9886080f26/library/core/src/ptr/mod.rs:497
#18 0x000055da05657062 in core::ptr::drop_in_place<smithay::backend::allocator::swapchain::Swapchain<smithay::backend::allocator::gbm::GbmAllocator<smithay::backend::drm::device::fd::DrmDeviceFd>>> ()
    at /rustc/eb26296b556cef10fb713a38f3d16b9886080f26/library/core/src/ptr/mod.rs:497
#19 0x000055da05657499 in core::ptr::drop_in_place<smithay::backend::drm::surface::gbm::GbmBufferedSurface<smithay::backend::allocator::gbm::GbmAllocator<smithay::backend::drm::device::fd::DrmDeviceFd>, ()>> ()
    at /rustc/eb26296b556cef10fb713a38f3d16b9886080f26/library/core/src/ptr/mod.rs:497
#20 0x000055da05658c53 in core::ptr::drop_in_place<rendering::surface::OutputSurface> () at /rustc/eb26296b556cef10fb713a38f3d16b9886080f26/library/core/src/ptr/mod.rs:497
#21 0x000055da0565b35e in core::ptr::drop_in_place<(drm::control::crtc::Handle, rendering::surface::OutputSurface)> () at /rustc/eb26296b556cef10fb713a38f3d16b9886080f26/library/core/src/ptr/mod.rs:497
#22 0x000055da056f0484 in core::ptr::mut_ptr::{impl#0}::drop_in_place<(drm::control::crtc::Handle, rendering::surface::OutputSurface)> (self=0x55da06d365c0)
    at /rustc/eb26296b556cef10fb713a38f3d16b9886080f26/library/core/src/ptr/mut_ptr.rs:1449
#23 hashbrown::raw::Bucket<(drm::control::crtc::Handle, rendering::surface::OutputSurface)>::drop<(drm::control::crtc::Handle, rendering::surface::OutputSurface)> (self=0x7ffdf6eb80c0)
    at /cargo/registry/src/index.crates.io-6f17d22bba15001f/hashbrown-0.13.1/src/raw/mod.rs:344
#24 0x000055da056f1e4f in hashbrown::raw::RawTable<(drm::control::crtc::Handle, rendering::surface::OutputSurface), alloc::alloc::Global>::drop_elements<(drm::control::crtc::Handle, rendering::surface::OutputSurface), alloc::alloc::Global> (self=0x55da06481640) at /cargo/registry/src/index.crates.io-6f17d22bba15001f/hashbrown-0.13.1/src/raw/mod.rs:620
#25 0x000055da0566050e in hashbrown::raw::{impl#17}::drop<(drm::control::crtc::Handle, rendering::surface::OutputSurface), alloc::alloc::Global> (self=0x55da06481640)
    at /cargo/registry/src/index.crates.io-6f17d22bba15001f/hashbrown-0.13.1/src/raw/mod.rs:1848
#26 0x000055da05655dfa in core::ptr::drop_in_place<hashbrown::raw::RawTable<(drm::control::crtc::Handle, rendering::surface::OutputSurface), alloc::alloc::Global>> ()
    at /rustc/eb26296b556cef10fb713a38f3d16b9886080f26/library/core/src/ptr/mod.rs:497
#27 0x000055da05656a0a in core::ptr::drop_in_place<hashbrown::map::HashMap<drm::control::crtc::Handle, rendering::surface::OutputSurface, std::collections::hash::map::RandomState, alloc::alloc::Global>> ()
    at /rustc/eb26296b556cef10fb713a38f3d16b9886080f26/library/core/src/ptr/mod.rs:497
#28 0x000055da05655eba in core::ptr::drop_in_place<std::collections::hash::map::HashMap<drm::control::crtc::Handle, rendering::surface::OutputSurface, std::collections::hash::map::RandomState>> ()
    at /rustc/eb26296b556cef10fb713a38f3d16b9886080f26/library/core/src/ptr/mod.rs:497
#29 0x000055da05658319 in core::ptr::drop_in_place<rendering::Device> () at /rustc/eb26296b556cef10fb713a38f3d16b9886080f26/library/core/src/ptr/mod.rs:497
#30 0x000055da0565aece in core::ptr::drop_in_place<(smithay::backend::drm::node::DrmNode, rendering::Device)> () at /rustc/eb26296b556cef10fb713a38f3d16b9886080f26/library/core/src/ptr/mod.rs:497
#31 0x000055da056f0404 in core::ptr::mut_ptr::{impl#0}::drop_in_place<(smithay::backend::drm::node::DrmNode, rendering::Device)> (self=0x55da064815d0) at /rustc/eb26296b556cef10fb713a38f3d16b9886080f26/library/core/src/ptr/mut_ptr.rs:1449
#32 hashbrown::raw::Bucket<(smithay::backend::drm::node::DrmNode, rendering::Device)>::drop<(smithay::backend::drm::node::DrmNode, rendering::Device)> (self=0x7ffdf6eb8260)
    at /cargo/registry/src/index.crates.io-6f17d22bba15001f/hashbrown-0.13.1/src/raw/mod.rs:344
#33 0x000055da056f1d6f in hashbrown::raw::RawTable<(smithay::backend::drm::node::DrmNode, rendering::Device), alloc::alloc::Global>::drop_elements<(smithay::backend::drm::node::DrmNode, rendering::Device), alloc::alloc::Global>
    (self=0x7ffdf6eb8970) at /cargo/registry/src/index.crates.io-6f17d22bba15001f/hashbrown-0.13.1/src/raw/mod.rs:620
#34 0x000055da0566048e in hashbrown::raw::{impl#17}::drop<(smithay::backend::drm::node::DrmNode, rendering::Device), alloc::alloc::Global> (self=0x7ffdf6eb8970)
    at /cargo/registry/src/index.crates.io-6f17d22bba15001f/hashbrown-0.13.1/src/raw/mod.rs:1848
#35 0x000055da05655b3a in core::ptr::drop_in_place<hashbrown::raw::RawTable<(smithay::backend::drm::node::DrmNode, rendering::Device), alloc::alloc::Global>> ()
    at /rustc/eb26296b556cef10fb713a38f3d16b9886080f26/library/core/src/ptr/mod.rs:497
--Type <RET> for more, q to quit, c to continue without paging--jjjjjjjjjjjjjjj
#36 0x000055da056567ca in core::ptr::drop_in_place<hashbrown::map::HashMap<smithay::backend::drm::node::DrmNode, rendering::Device, std::collections::hash::map::RandomState, alloc::alloc::Global>> ()
    at /rustc/eb26296b556cef10fb713a38f3d16b9886080f26/library/core/src/ptr/mod.rs:497
#37 0x000055da05655dea in core::ptr::drop_in_place<std::collections::hash::map::HashMap<smithay::backend::drm::node::DrmNode, rendering::Device, std::collections::hash::map::RandomState>> ()
    at /rustc/eb26296b556cef10fb713a38f3d16b9886080f26/library/core/src/ptr/mod.rs:497
#38 0x000055da05658133 in core::ptr::drop_in_place<rendering::State> () at /rustc/eb26296b556cef10fb713a38f3d16b9886080f26/library/core/src/ptr/mod.rs:497
#39 0x000055da056b1067 in rendering::main () at smithay-drm-extras/examples/rendering/main.rs:120
#40 0x000055da05654c8b in core::ops::function::FnOnce::call_once<fn() -> core::result::Result<(), alloc::boxed::Box<dyn core::error::Error, alloc::alloc::Global>>, ()> ()
    at /rustc/eb26296b556cef10fb713a38f3d16b9886080f26/library/core/src/ops/function.rs:250
#41 0x000055da05634a4e in std::sys_common::backtrace::__rust_begin_short_backtrace<fn() -> core::result::Result<(), alloc::boxed::Box<dyn core::error::Error, alloc::alloc::Global>>, core::result::Result<(), alloc::boxed::Box<dyn core::error::Error, alloc::alloc::Global>>> (f=0x55da056b08e0 <rendering::main>) at /rustc/eb26296b556cef10fb713a38f3d16b9886080f26/library/std/src/sys_common/backtrace.rs:135
#42 0x000055da0571c1e1 in std::rt::lang_start::{closure#0}<core::result::Result<(), alloc::boxed::Box<dyn core::error::Error, alloc::alloc::Global>>> () at /rustc/eb26296b556cef10fb713a38f3d16b9886080f26/library/std/src/rt.rs:166
#43 0x000055da05989b25 in core::ops::function::impls::{impl#2}::call_once<(), (dyn core::ops::function::Fn<(), Output=i32> + core::marker::Sync + core::panic::unwind_safe::RefUnwindSafe)> () at library/core/src/ops/function.rs:284
#44 std::panicking::try::do_call<&(dyn core::ops::function::Fn<(), Output=i32> + core::marker::Sync + core::panic::unwind_safe::RefUnwindSafe), i32> () at library/std/src/panicking.rs:500
#45 std::panicking::try<i32, &(dyn core::ops::function::Fn<(), Output=i32> + core::marker::Sync + core::panic::unwind_safe::RefUnwindSafe)> () at library/std/src/panicking.rs:464
#46 std::panic::catch_unwind<&(dyn core::ops::function::Fn<(), Output=i32> + core::marker::Sync + core::panic::unwind_safe::RefUnwindSafe), i32> () at library/std/src/panic.rs:142
#47 std::rt::lang_start_internal::{closure#2} () at library/std/src/rt.rs:148
#48 std::panicking::try::do_call<std::rt::lang_start_internal::{closure_env#2}, isize> () at library/std/src/panicking.rs:500
#49 std::panicking::try<isize, std::rt::lang_start_internal::{closure_env#2}> () at library/std/src/panicking.rs:464
#50 std::panic::catch_unwind<std::rt::lang_start_internal::{closure_env#2}, isize> () at library/std/src/panic.rs:142
#51 std::rt::lang_start_internal () at library/std/src/rt.rs:148
#52 0x000055da0571c1ba in std::rt::lang_start<core::result::Result<(), alloc::boxed::Box<dyn core::error::Error, alloc::alloc::Global>>> (main=0x55da056b08e0 <rendering::main>, argc=1, argv=0x7ffdf6eb9018, sigpipe=0)
    at /rustc/eb26296b556cef10fb713a38f3d16b9886080f26/library/std/src/rt.rs:165
#53 0x000055da056b16ce in main ()
#54 0x00007fd67d954b4a in __libc_start_call_main () at /lib64/libc.so.6
#55 0x00007fd67d954c0b in __libc_start_main_impl () at /lib64/libc.so.6
#56 0x000055da05631ff5 in _start ()
PolyMeilex commented 1 year ago

Full code that reproduces this can be found here: https://github.com/PolyMeilex/smithay/commit/39c7066194bfe5c472464b5fae1de4ebabc365fc

Just run the smithay-drm-extras rendering example from this commit, and wait 5s for the process to auto exit, when exiting it will drop those types and cause segfault.

Drakulix commented 1 year ago

Oh interesting. Given that this happens in gbm_bo_destroy, this looks like it is a gbm.rs issue.

ids1024 commented 1 year ago
/** Destroy the gbm device and free all resources associated with it.
 *
 * Prior to calling this function all buffers and surfaces created with the
 * gbm device need to be destroyed.
 *
 * \param gbm The device created using gbm_create_device()   
 */
GBM_EXPORT void
gbm_device_destroy(struct gbm_device *gbm)

I guess to offer a safe API, gbm.rs needs to reference count the struct gbm_device.

ids1024 commented 1 year ago

Oh, I see it's already using WeakPtr. Using a Ptr there instead should fix this. Either that or some more complicated mechanism that would free all buffers and surfaces when the device is freed. (Probably better to stick to a strong reference, since weak isn't needed to break a reference cycle here.)

Drakulix commented 1 year ago

Having buffers keep devices open seems potentially very leaky to me.

Closing the device invalidates surfaces and buffers. So what we need to do is only conditionally call the destructor, if the device ptr is still valid.

ids1024 commented 1 year ago

The documentation for gbm_device_destroy says "Prior to calling this function all buffers and surfaces created with the gbm device need to be destroyed."

Taken literally, simply not running the buffer and surface destructors wouldn't be valid. Though I don't know how it's implemented. I'd expect that may at least leak resources.

Drakulix commented 1 year ago

The documentation for gbm_device_destroy says "Prior to calling this function all buffers and surfaces created with the gbm device need to be destroyed."

Oh no. I wonder if we should then instead track the buffers and surfaces in the device struct (just by tracking the raw pointers and destroying still registered ones on drop). Then surfaces and buffers could still have Weak references to the device and use those to remove their respective pointers from the lists of surfaces and buffers on drop, if the weak reference is still valid.

ids1024 commented 1 year ago

Yeah, storing Mutex<Vec<_>>s of the surfaces and buffers in the Device could also handle this. Fixing the issue while maintaining the current API and its (intended) behavior.

Personally my inclination for APIs like this is to have a strong reference. As long as there's no reference cycle nothing is leaked if the application doesn't leak anything, and the API is easier to use that way. It seems intuitive enough that the device is kept open as long as any buffers or surfaces using it exist.

But either would fix this.

Drakulix commented 1 year ago

The problem for me is that unintentionally leaving the device open for gbm can cause higher resource usage, e.g. higher power states of the gpu, and block unloading kernel modules. Compositors wishing the free a device should be able to do so easily and buffers might be cached at multiple levels, so cleaning up every reference might be hard, while handling invalid buffers might be much easier.

ids1024 commented 1 year ago

Ah, I wasn't thinking of that. Yeah, it could make sense for gbm, if it isn't easy to ensure the surfaces and buffers are freed when the compositor wants to free the device.

Drakulix commented 6 days ago

Given all these types are contained in a DrmCompositor now for most compositors (and certainly for cosmic-comp, which also implements the whole device power down infrastructure now), I'd say strong references are not as bad as I initially expected.

@ids1024 If you ever find the time to update https://github.com/Smithay/gbm.rs/pull/25 and remove the Draft state, we can do a strong counted 0.17 release for gbm.rs. (And then carefully update smithay and cosmic-comp.)