`wgpu-core` submission tracking fails to avoid resource management overhead

jimblandy commented 5 months ago

wgpu-core's submission tracking goes to great pains to avoid refcount traffic for resources used by commands in flight, but those efforts are ineffective, and the traffic occurs anyway. Since the complexity fails to deliver, it should be removed.

In more detail:

Our backend APIs require that resources (buffers, textures, etc.) used by commands in flight not be freed until the commands have finished execution. (Quite reasonable.)

One of the striking things about wgpu_core::device::life::ActiveSubmission is that it does not hold strong references to the resources used by the commands it represents. Rather, the ActiveSubmission::last_resources set is initially empty (ignore Device::pending_writes for now), and resources are only added to it if they are destroyed while their most recent queue submission is still in flight, as a way of deferring their actual destruction until after the command has finished.

As far as I understand, it's done this way because it's rare in practice for applications to free resources while they are still in use by the GPU, and thus any time spent fiddling with ownership on each queue submission is almost always wasted. With the approach above, in the common case, most ActiveSubmission::last_resources sets remain empty throughout their lifetime, and no resources need to be touched on completion.

However, this strategy is ineffective.

Compare this approach to a simpler one where ActiveSubmission simply holds strong references to the resources its commands use. Queue submission would move strong references from CommandBuffers and PendingWrites into the new ActiveSubmission. When execution is complete, those strong references would be dropped, and any unreferenced resources would be freed.

First, note that, in the best case where nothing is actually freed, the only difference between the two approaches is the amount of refcount traffic at command completion. Dropping strong references to resources that are still alive is a pointer dereference and an atomic decrement.

Second, note that wgpu_core::command::CommandBuffer holds strong references to the resources its commands use. On queue submission, these strong references are packed into a BakedCommands value, which is then dropped. This incurs exactly the same pointer dereference and refcount traffic, on every resource directly used by the command buffer, that would occur at execution completion if ActiveSubmission held strong references to all resources. The only effect of the current approach is to move the pointer dereferences and refcount fiddling from queue draining time to submission time.

Moving the traffic from later to earlier doesn't seem worth the complexity required to support it.

Instead, queue submission should move the owning references from the command buffers and pending writes into the ActiveSubmission, along with the encoders and command buffers. This will move the refcount traffic from queue submission time to queue draining time. We can then remove the delicate and complex code that populates ActiveSubmission::last_resources. LifetimeTracker::triage_suspected and everything that supports it could go away, replaced by idiomatic use of Arc.

cc @kvark

cwfitzgerald commented 5 months ago

I agree with the conclusions here, and think we should go ahead.

jimblandy commented 5 months ago

Here's another problem which would be fixed by taking a more straightforward ownership approach:

Check my work, but it seems to me that, once a resource gets added to LifetimeTracker::suspected_resources, it will remain there for the rest of its or the Device's lifetime, whichever is shorter. It will get checked on every maintain, until it actually dies or the Device is closed.

The only thing that I can see that ever removes something from a LifetimeTracker::suspected_resources field is LifetimeTracker::triage_resources, and that only does so if the resource has been abandoned. ResourceMaps::clear is only called at Device shutdown.

This seems very bad. A resource that belongs to a bind group will become suspected when the bind group is dropped, and then it'll just stay that way, getting checked over and over for no reason.

cwfitzgerald commented 4 months ago

I just wrote a benchmark for wgpu that uses a fair amount of resources (to stress our infra as much as possible) and this problem is actually a very large chunk of time. With 90k resources, submit takes 8ms or so. If we can move away from this, it would bring some pretty incredible wins.

gfx-rs / wgpu

`wgpu-core` submission tracking fails to avoid resource management overhead #5560