Improve lifetime management of command encoders and buffers

teoxoy commented 1 week ago

This PR improves the lifetime management of command encoders (it's now automatic on drop, no more manual destruction), it also removes duplicate invalidity state from command encodes and with the last commit, we now invalidate the command encoder if there have been any errors while recording (per spec).

teoxoy commented 1 week ago

I couldn't repro the issue in CI, thanks for pairing @ErichDonGubler and helping to debug it!

teoxoy commented 1 week ago

@ErichDonGubler I added the comment on the Queue.device field.

jimblandy commented 5 days ago

If it's acceptable for unexpected operations in the Finished state to transition the command buffer to the Error state, then this commit shows that, even aside from deleting the macro itself, the code can actually be shorter without the macro.

teoxoy commented 4 days ago

If it's acceptable for unexpected operations in the Finished state to transition the command buffer to the Error state, then this commit shows that, even aside from deleting the macro itself, the code can actually be shorter without the macro.

That's how the code used to look in a previous iteration of the patch, the issue was that we can't replace Finished with Error. We have tests that check for this, which is how I discovered it, I then looked at the spec and it also doesn't say that we can invalidate the encoder, only that a validation error is raised.

I don't like the macro either but it seemed appropriate to hide the ugliness. Something else we could try is to match twice in the methods, will put up a commit for that to see if we like it more.

teoxoy commented 4 days ago

I guess we could also just reassign the finished state back.

teoxoy commented 4 days ago

I think I addressed all the comments.

jimblandy commented 3 days ago

One interesting wrinkle in this is that, since struct fields are dropped in order, dropping a wgpu_core::Device will drop Device::raw before Device::command_allocator. This means that wgpu_hal requires wgpu_hal::CommandEncoder implementations to be droppable even after the wgpu_hal::Device is gone.

jimblandy commented 3 days ago

Sorry, I'm very close to done with this review. Will finish tomorrow morning.

teoxoy commented 3 days ago

One interesting wrinkle in this is that, since struct fields are dropped in order, dropping a wgpu_core::Device will drop Device::raw before Device::command_allocator. This means that wgpu_hal requires wgpu_hal::CommandEncoder implementations to be droppable even after the wgpu_hal::Device is gone.

This is the case right now but is a bit problematic that we now have Drop implementations for some objects in HAL but also the destroy_* methods that require callers to pass a &HalDevice (implicitly proving that they still have a HAL device around).

I'm not sure what the best solution is but I'm leaning towards adding Drop implementations for all HAL objects and moving the burden of keeping device backrefs to HAL backends. Interestingly (I think) only Vulkan would need device backrefs since Metal and D3D12 objects already have backrefs. Also, the current implementations of the destroy_* methods in Metal and D3D12 are almost always empty.

jimblandy commented 3 days ago

I'm not sure what the best solution is but I'm leaning towards adding Drop implementations for all HAL objects and moving the burden of keeping device backrefs to HAL backends. Interestingly (I think) only Vulkan would need device backrefs since Metal and D3D12 objects already have backrefs. Also, the current implementations of the destroy_* methods in Metal and D3D12 are almost always empty.

Right now, the wgpu_hal documentation says, "A CommandEncoder must not outlive its Device." If wgpu_hal relaxes this rule, and allows CommandEncoders to outlive Devices, then we need to decide whether:

it is safe to call methods on a CommandEncoder after its Device has been dropped, or
once the Device has been dropped, the only thing you can do with a CommandEncoder is drop it.

jimblandy commented 3 days ago

In any case, this PR as it stands doesn't respect wgpu_hal's rule that "A CommandEncoder must not outlive its Device." So either the code or the wgpu_hal documentation needs to be updated. Suggestion:

modified   wgpu-hal/src/lib.rs
@@ -1108,7 +1108,9 @@ pub trait Queue: WasmNotSendSync {
 /// - A `CommandBuffer` must not outlive the `CommandEncoder` that
 ///   built it.
 ///
-/// - A `CommandEncoder` must not outlive its `Device`.
+/// - A `CommandEncoder` may outlive the `Device` that created it. But once the
+///   `Device` has been dropped, the only operation permitted on the
+///   `CommandEncoder` is to drop it.
 ///
 /// It is the user's responsibility to meet these requirements. This
 /// allows `CommandEncoder` implementations to keep their state

jimblandy commented 1 day ago

Wow, so dropping a CommandBufferMutable never recycled the wgpu_hal encoder in Device::command_allocator.

jimblandy commented 1 day ago

Right now, the wgpu_hal documentation says, "A CommandEncoder must not outlive its Device." If wgpu_hal relaxes this rule, and allows CommandEncoders to outlive Devices, then we need to decide whether:

it is safe to call methods on a CommandEncoder after its Device has been dropped, or

once the Device has been dropped, the only thing you can do with a CommandEncoder is drop it.

There's a third alternative:

Keep the wgpu_hal requirement that CommandEncoders must not outlive their Device, and change the order of fields in wgpu_core::device::Device so that the device is dropped last.

This wouldn't be a wgpu_hal property that any of the backends would actually exploit. But having a logical but unnecessary rule that's not hard to follow seems better than having strange rules. So I think this would be my first choice.

teoxoy commented 1 day ago

That sounds good to me too. I can make the change tomorrow but let me know if there is anything else I should address.

gfx-rs / wgpu

Improve lifetime management of command encoders and buffers #6544