Closed jimblandy closed 2 months ago
I agree with the conclusions here, and think we should go ahead.
Here's another problem which would be fixed by taking a more straightforward ownership approach:
Check my work, but it seems to me that, once a resource gets added to LifetimeTracker::suspected_resources
, it will remain there for the rest of its or the Device
's lifetime, whichever is shorter. It will get checked on every maintain
, until it actually dies or the Device
is closed.
The only thing that I can see that ever removes something from a LifetimeTracker::suspected_resources
field is LifetimeTracker::triage_resources
, and that only does so if the resource has been abandoned. ResourceMaps::clear
is only called at Device
shutdown.
This seems very bad. A resource that belongs to a bind group will become suspected when the bind group is dropped, and then it'll just stay that way, getting checked over and over for no reason.
I just wrote a benchmark for wgpu that uses a fair amount of resources (to stress our infra as much as possible) and this problem is actually a very large chunk of time. With 90k resources, submit takes 8ms or so. If we can move away from this, it would bring some pretty incredible wins.
wgpu-core
's submission tracking goes to great pains to avoid refcount traffic for resources used by commands in flight, but those efforts are ineffective, and the traffic occurs anyway. Since the complexity fails to deliver, it should be removed.In more detail:
Our backend APIs require that resources (buffers, textures, etc.) used by commands in flight not be freed until the commands have finished execution. (Quite reasonable.)
One of the striking things about
wgpu_core::device::life::ActiveSubmission
is that it does not hold strong references to the resources used by the commands it represents. Rather, theActiveSubmission::last_resources
set is initially empty (ignoreDevice::pending_writes
for now), and resources are only added to it if they are destroyed while their most recent queue submission is still in flight, as a way of deferring their actual destruction until after the command has finished.As far as I understand, it's done this way because it's rare in practice for applications to free resources while they are still in use by the GPU, and thus any time spent fiddling with ownership on each queue submission is almost always wasted. With the approach above, in the common case, most
ActiveSubmission::last_resources
sets remain empty throughout their lifetime, and no resources need to be touched on completion.However, this strategy is ineffective.
Compare this approach to a simpler one where
ActiveSubmission
simply holds strong references to the resources its commands use. Queue submission would move strong references fromCommandBuffer
s andPendingWrites
into the newActiveSubmission
. When execution is complete, those strong references would be dropped, and any unreferenced resources would be freed.First, note that, in the best case where nothing is actually freed, the only difference between the two approaches is the amount of refcount traffic at command completion. Dropping strong references to resources that are still alive is a pointer dereference and an atomic decrement.
Second, note that
wgpu_core::command::CommandBuffer
holds strong references to the resources its commands use. On queue submission, these strong references are packed into aBakedCommands
value, which is then dropped. This incurs exactly the same pointer dereference and refcount traffic, on every resource directly used by the command buffer, that would occur at execution completion ifActiveSubmission
held strong references to all resources. The only effect of the current approach is to move the pointer dereferences and refcount fiddling from queue draining time to submission time.Moving the traffic from later to earlier doesn't seem worth the complexity required to support it.
Instead, queue submission should move the owning references from the command buffers and pending writes into the
ActiveSubmission
, along with the encoders and command buffers. This will move the refcount traffic from queue submission time to queue draining time. We can then remove the delicate and complex code that populatesActiveSubmission::last_resources
.LifetimeTracker::triage_suspected
and everything that supports it could go away, replaced by idiomatic use ofArc
.cc @kvark