Open PolyMeilex opened 1 year ago
Full code that reproduces this can be found here: https://github.com/PolyMeilex/smithay/commit/39c7066194bfe5c472464b5fae1de4ebabc365fc
Just run the smithay-drm-extras rendering example from this commit, and wait 5s for the process to auto exit, when exiting it will drop those types and cause segfault.
Oh interesting. Given that this happens in gbm_bo_destroy, this looks like it is a gbm.rs issue.
/** Destroy the gbm device and free all resources associated with it.
*
* Prior to calling this function all buffers and surfaces created with the
* gbm device need to be destroyed.
*
* \param gbm The device created using gbm_create_device()
*/
GBM_EXPORT void
gbm_device_destroy(struct gbm_device *gbm)
I guess to offer a safe API, gbm.rs needs to reference count the struct gbm_device
.
Oh, I see it's already using WeakPtr
. Using a Ptr
there instead should fix this. Either that or some more complicated mechanism that would free all buffers and surfaces when the device is freed. (Probably better to stick to a strong reference, since weak isn't needed to break a reference cycle here.)
Having buffers keep devices open seems potentially very leaky to me.
Closing the device invalidates surfaces and buffers. So what we need to do is only conditionally call the destructor, if the device ptr is still valid.
The documentation for gbm_device_destroy
says "Prior to calling this function all buffers and surfaces created with the gbm device need to be destroyed."
Taken literally, simply not running the buffer and surface destructors wouldn't be valid. Though I don't know how it's implemented. I'd expect that may at least leak resources.
The documentation for gbm_device_destroy says "Prior to calling this function all buffers and surfaces created with the gbm device need to be destroyed."
Oh no. I wonder if we should then instead track the buffers and surfaces in the device struct (just by tracking the raw pointers and destroying still registered ones on drop). Then surfaces and buffers could still have Weak references to the device and use those to remove their respective pointers from the lists of surfaces and buffers on drop, if the weak reference is still valid.
Yeah, storing Mutex<Vec<_>>
s of the surfaces and buffers in the Device
could also handle this. Fixing the issue while maintaining the current API and its (intended) behavior.
Personally my inclination for APIs like this is to have a strong reference. As long as there's no reference cycle nothing is leaked if the application doesn't leak anything, and the API is easier to use that way. It seems intuitive enough that the device is kept open as long as any buffers or surfaces using it exist.
But either would fix this.
The problem for me is that unintentionally leaving the device open for gbm can cause higher resource usage, e.g. higher power states of the gpu, and block unloading kernel modules. Compositors wishing the free a device should be able to do so easily and buffers might be cached at multiple levels, so cleaning up every reference might be hard, while handling invalid buffers might be much easier.
Ah, I wasn't thinking of that. Yeah, it could make sense for gbm, if it isn't easy to ensure the surfaces and buffers are freed when the compositor wants to free the device.
Given all these types are contained in a DrmCompositor
now for most compositors (and certainly for cosmic-comp, which also implements the whole device power down infrastructure now), I'd say strong references are not as bad as I initially expected.
@ids1024 If you ever find the time to update https://github.com/Smithay/gbm.rs/pull/25 and remove the Draft
state, we can do a strong counted 0.17 release for gbm.rs. (And then carefully update smithay and cosmic-comp.)
This order of drop causes segfault:
This one works fine:
Backtrace of said segfault