Deadlock in 0.20.0 and trunk

SludgePhD commented 2 months ago

Description

create_bind_group (8x):

#7  lock_shared_slow () at src/raw_rwlock.rs:719
#8  0x00005c2732adfb7a in lock_shared () at /home/sludge/.cargo/registry/src/index.crates.io-6f17d22bba15001f/parking_lot-0.12.1/src/raw_rwlock.rs:109
#9  read<parking_lot::raw_rwlock::RawRwLock, wgpu_core::storage::Storage<wgpu_core::resource::Buffer<wgpu_hal::vulkan::Api>>> () at /home/sludge/.cargo/registry/src/index.crates.io-6f17d22bba15001f/lock_api-0.4.11/src/rwlock.rs:459
#10 read<wgpu_core::storage::Storage<wgpu_core::resource::Buffer<wgpu_hal::vulkan::Api>>> () at /home/sludge/.cargo/registry/src/index.crates.io-6f17d22bba15001f/wgpu-core-0.20.0/src/lock/vanilla.rs:81
#11 read<wgpu_core::resource::Buffer<wgpu_hal::vulkan::Api>> () at /home/sludge/.cargo/registry/src/index.crates.io-6f17d22bba15001f/wgpu-core-0.20.0/src/registry.rs:142
#12 create_bind_group<wgpu_hal::vulkan::Api> () at /home/sludge/.cargo/registry/src/index.crates.io-6f17d22bba15001f/wgpu-core-0.20.0/src/device/resource.rs:2208
#13 0x00005c2732aa08d0 in device_create_bind_group<wgpu_hal::vulkan::Api> () at /home/sludge/.cargo/registry/src/index.crates.io-6f17d22bba15001f/wgpu-core-0.20.0/src/device/global.rs:1130
#14 0x00005c2732b5e8c4 in device_create_bind_group () at src/backend/wgpu_core.rs:1055
#15 0x00005c2732b6b7f8 in device_create_bind_group<wgpu::backend::wgpu_core::ContextWgpuCore> () at src/context.rs:2240
#16 0x00005c2732b978d1 in create_bind_group () at src/lib.rs:2650

command_encoder_end_render_pass:

(2x)
#7  lock_shared_slow () at src/raw_rwlock.rs:719
#8  0x00005c2732ab6796 in lock_shared () at /home/sludge/.cargo/registry/src/index.crates.io-6f17d22bba15001f/parking_lot-0.12.1/src/raw_rwlock.rs:109
#9  read<parking_lot::raw_rwlock::RawRwLock, wgpu_core::storage::Storage<wgpu_core::binding_model::BindGroup<wgpu_hal::vulkan::Api>>> () at /home/sludge/.cargo/registry/src/index.crates.io-6f17d22bba15001f/lock_api-0.4.11/src/rwlock.rs:459
#10 read<wgpu_core::storage::Storage<wgpu_core::binding_model::BindGroup<wgpu_hal::vulkan::Api>>> () at /home/sludge/.cargo/registry/src/index.crates.io-6f17d22bba15001f/wgpu-core-0.20.0/src/lock/vanilla.rs:81
#11 read<wgpu_core::binding_model::BindGroup<wgpu_hal::vulkan::Api>> () at /home/sludge/.cargo/registry/src/index.crates.io-6f17d22bba15001f/wgpu-core-0.20.0/src/registry.rs:142
#12 command_encoder_run_render_pass_impl<wgpu_hal::vulkan::Api> () at /home/sludge/.cargo/registry/src/index.crates.io-6f17d22bba15001f/wgpu-core-0.20.0/src/command/render.rs:1386
#13 0x00005c2732b68001 in command_encoder_run_render_pass<wgpu_hal::vulkan::Api> () at /home/sludge/.cargo/registry/src/index.crates.io-6f17d22bba15001f/wgpu-core-0.20.0/src/command/render.rs:1311
#14 command_encoder_end_render_pass () at src/backend/wgpu_core.rs:1933
#15 0x00005c2732b6dac0 in command_encoder_end_render_pass<wgpu::backend::wgpu_core::ContextWgpuCore> () at src/context.rs:2771

#7  lock_shared_slow () at src/raw_rwlock.rs:719
#8  0x00005c2732ab6892 in lock_shared () at /home/sludge/.cargo/registry/src/index.crates.io-6f17d22bba15001f/parking_lot-0.12.1/src/raw_rwlock.rs:109
#9  read<parking_lot::raw_rwlock::RawRwLock, wgpu_core::storage::Storage<wgpu_core::resource::Buffer<wgpu_hal::vulkan::Api>>> () at /home/sludge/.cargo/registry/src/index.crates.io-6f17d22bba15001f/lock_api-0.4.11/src/rwlock.rs:459
#10 read<wgpu_core::storage::Storage<wgpu_core::resource::Buffer<wgpu_hal::vulkan::Api>>> () at /home/sludge/.cargo/registry/src/index.crates.io-6f17d22bba15001f/wgpu-core-0.20.0/src/lock/vanilla.rs:81
#11 read<wgpu_core::resource::Buffer<wgpu_hal::vulkan::Api>> () at /home/sludge/.cargo/registry/src/index.crates.io-6f17d22bba15001f/wgpu-core-0.20.0/src/registry.rs:142
#12 command_encoder_run_render_pass_impl<wgpu_hal::vulkan::Api> () at /home/sludge/.cargo/registry/src/index.crates.io-6f17d22bba15001f/wgpu-core-0.20.0/src/command/render.rs:1389
#13 0x00005c2732b68001 in command_encoder_run_render_pass<wgpu_hal::vulkan::Api> () at /home/sludge/.cargo/registry/src/index.crates.io-6f17d22bba15001f/wgpu-core-0.20.0/src/command/render.rs:1311
#14 command_encoder_end_render_pass () at src/backend/wgpu_core.rs:1933
#15 0x00005c2732b6dac0 in command_encoder_end_render_pass<wgpu::backend::wgpu_core::ContextWgpuCore> () at src/context.rs:2771

device_create_buffer (6x):
#7  lock_exclusive_slow () at src/raw_rwlock.rs:633
#8  0x00005c2732c1a6d0 in lock_exclusive () at /home/sludge/.cargo/registry/src/index.crates.io-6f17d22bba15001f/parking_lot-0.12.1/src/raw_rwlock.rs:73
#9  write<parking_lot::raw_rwlock::RawRwLock, wgpu_core::storage::Storage<wgpu_core::resource::Buffer<wgpu_hal::vulkan::Api>>> () at /home/sludge/.cargo/registry/src/index.crates.io-6f17d22bba15001f/lock_api-0.4.11/src/rwlock.rs:491
#10 write<wgpu_core::storage::Storage<wgpu_core::resource::Buffer<wgpu_hal::vulkan::Api>>> () at /home/sludge/.cargo/registry/src/index.crates.io-6f17d22bba15001f/wgpu-core-0.20.0/src/lock/vanilla.rs:85
#11 assign<wgpu_core::resource::Buffer<wgpu_hal::vulkan::Api>> () at /home/sludge/.cargo/registry/src/index.crates.io-6f17d22bba15001f/wgpu-core-0.20.0/src/registry.rs:94
#12 0x00005c2732a9b7ce in device_create_buffer<wgpu_hal::vulkan::Api> () at /home/sludge/.cargo/registry/src/index.crates.io-6f17d22bba15001f/wgpu-core-0.20.0/src/device/global.rs:260
#13 0x00005c2732b6138b in device_create_buffer () at src/backend/wgpu_core.rs:1251

bind_group_drop:
#7  lock_exclusive_slow () at src/raw_rwlock.rs:633
#8  0x00005c2732c1d4d8 in lock_exclusive () at /home/sludge/.cargo/registry/src/index.crates.io-6f17d22bba15001f/parking_lot-0.12.1/src/raw_rwlock.rs:73
#9  write<parking_lot::raw_rwlock::RawRwLock, wgpu_core::storage::Storage<wgpu_core::binding_model::BindGroup<wgpu_hal::vulkan::Api>>> () at /home/sludge/.cargo/registry/src/index.crates.io-6f17d22bba15001f/lock_api-0.4.11/src/rwlock.rs:491
#10 write<wgpu_core::storage::Storage<wgpu_core::binding_model::BindGroup<wgpu_hal::vulkan::Api>>> () at /home/sludge/.cargo/registry/src/index.crates.io-6f17d22bba15001f/wgpu-core-0.20.0/src/lock/vanilla.rs:85
#11 unregister<wgpu_core::binding_model::BindGroup<wgpu_hal::vulkan::Api>> () at /home/sludge/.cargo/registry/src/index.crates.io-6f17d22bba15001f/wgpu-core-0.20.0/src/registry.rs:166
#12 0x00005c2732a95346 in bind_group_drop<wgpu_hal::vulkan::Api> () at /home/sludge/.cargo/registry/src/index.crates.io-6f17d22bba15001f/wgpu-core-0.20.0/src/device/global.rs:1165
#13 0x00005c2732b6ce58 in bind_group_drop<wgpu::backend::wgpu_core::ContextWgpuCore> () at src/context.rs:2541

create_bind_group exclusive (2x)
#6  wait_for_readers () at src/raw_rwlock.rs:1013
#7  0x00005c2732eb0647 in lock_exclusive_slow () at src/raw_rwlock.rs:644
#8  0x00005c2732c1b5d0 in lock_exclusive () at /home/sludge/.cargo/registry/src/index.crates.io-6f17d22bba15001f/parking_lot-0.12.1/src/raw_rwlock.rs:73
#9  write<parking_lot::raw_rwlock::RawRwLock, wgpu_core::storage::Storage<wgpu_core::binding_model::BindGroup<wgpu_hal::vulkan::Api>>> () at /home/sludge/.cargo/registry/src/index.crates.io-6f17d22bba15001f/lock_api-0.4.11/src/rwlock.rs:491
#10 write<wgpu_core::storage::Storage<wgpu_core::binding_model::BindGroup<wgpu_hal::vulkan::Api>>> () at /home/sludge/.cargo/registry/src/index.crates.io-6f17d22bba15001f/wgpu-core-0.20.0/src/lock/vanilla.rs:85
#11 assign<wgpu_core::binding_model::BindGroup<wgpu_hal::vulkan::Api>> () at /home/sludge/.cargo/registry/src/index.crates.io-6f17d22bba15001f/wgpu-core-0.20.0/src/registry.rs:94
#12 0x00005c2732aa0a3a in device_create_bind_group<wgpu_hal::vulkan::Api> () at /home/sludge/.cargo/registry/src/index.crates.io-6f17d22bba15001f/wgpu-core-0.20.0/src/device/global.rs:1135
#13 0x00005c2732b5e8c4 in device_create_bind_group () at src/backend/wgpu_core.rs:1055

#7  lock_shared_slow () at src/raw_rwlock.rs:719
#8  0x00005c2732c1f658 in lock_shared () at /home/sludge/.cargo/registry/src/index.crates.io-6f17d22bba15001f/parking_lot-0.12.1/src/raw_rwlock.rs:109
#9  read<parking_lot::raw_rwlock::RawRwLock, wgpu_core::storage::Storage<wgpu_core::resource::Buffer<wgpu_hal::vulkan::Api>>> () at /home/sludge/.cargo/registry/src/index.crates.io-6f17d22bba15001f/lock_api-0.4.11/src/rwlock.rs:459
#10 read<wgpu_core::storage::Storage<wgpu_core::resource::Buffer<wgpu_hal::vulkan::Api>>> () at /home/sludge/.cargo/registry/src/index.crates.io-6f17d22bba15001f/wgpu-core-0.20.0/src/lock/vanilla.rs:81
#11 read<wgpu_core::resource::Buffer<wgpu_hal::vulkan::Api>> () at /home/sludge/.cargo/registry/src/index.crates.io-6f17d22bba15001f/wgpu-core-0.20.0/src/registry.rs:142
#12 get<wgpu_core::resource::Buffer<wgpu_hal::vulkan::Api>> () at /home/sludge/.cargo/registry/src/index.crates.io-6f17d22bba15001f/wgpu-core-0.20.0/src/registry.rs:139
#13 0x00005c2732a9612e in buffer_map_async_inner<wgpu_hal::vulkan::Api> () at /home/sludge/.cargo/registry/src/index.crates.io-6f17d22bba15001f/wgpu-core-0.20.0/src/device/global.rs:2420
#14 buffer_map_async<wgpu_hal::vulkan::Api> () at /home/sludge/.cargo/registry/src/index.crates.io-6f17d22bba15001f/wgpu-core-0.20.0/src/device/global.rs:2389
#15 0x00005c2732b6494b in buffer_map_async () at src/backend/wgpu_core.rs:1512

command_encoder_end_compute_pass:
#8  0x00005c2732bb68a3 in lock_shared () at /home/sludge/.cargo/registry/src/index.crates.io-6f17d22bba15001f/parking_lot-0.12.1/src/raw_rwlock.rs:109
#9  read<parking_lot::raw_rwlock::RawRwLock, wgpu_core::storage::Storage<wgpu_core::resource::Buffer<wgpu_hal::vulkan::Api>>> () at /home/sludge/.cargo/registry/src/index.crates.io-6f17d22bba15001f/lock_api-0.4.11/src/rwlock.rs:459
#10 read<wgpu_core::storage::Storage<wgpu_core::resource::Buffer<wgpu_hal::vulkan::Api>>> () at /home/sludge/.cargo/registry/src/index.crates.io-6f17d22bba15001f/wgpu-core-0.20.0/src/lock/vanilla.rs:81
#11 read<wgpu_core::resource::Buffer<wgpu_hal::vulkan::Api>> () at /home/sludge/.cargo/registry/src/index.crates.io-6f17d22bba15001f/wgpu-core-0.20.0/src/registry.rs:142
#12 resolve_compute_command_ids<wgpu_hal::vulkan::Api> () at /home/sludge/.cargo/registry/src/index.crates.io-6f17d22bba15001f/wgpu-core-0.20.0/src/command/compute_command.rs:82
#13 0x00005c2732ac0486 in command_encoder_run_compute_pass_with_unresolved_commands<wgpu_hal::vulkan::Api> () at /home/sludge/.cargo/registry/src/index.crates.io-6f17d22bba15001f/wgpu-core-0.20.0/src/command/compute.rs:313
#14 command_encoder_run_compute_pass<wgpu_hal::vulkan::Api> () at /home/sludge/.cargo/registry/src/index.crates.io-6f17d22bba15001f/wgpu-core-0.20.0/src/command/compute.rs:298
#15 0x00005c2732b67417 in command_encoder_end_compute_pass () at src/backend/wgpu_core.rs:1849

#8  0x00005c2732bb6634 in lock_shared () at /home/sludge/.cargo/registry/src/index.crates.io-6f17d22bba15001f/parking_lot-0.12.1/src/raw_rwlock.rs:109
#9  read<parking_lot::raw_rwlock::RawRwLock, wgpu_core::storage::Storage<wgpu_core::binding_model::BindGroup<wgpu_hal::vulkan::Api>>> () at /home/sludge/.cargo/registry/src/index.crates.io-6f17d22bba15001f/lock_api-0.4.11/src/rwlock.rs:459
#10 read<wgpu_core::storage::Storage<wgpu_core::binding_model::BindGroup<wgpu_hal::vulkan::Api>>> () at /home/sludge/.cargo/registry/src/index.crates.io-6f17d22bba15001f/wgpu-core-0.20.0/src/lock/vanilla.rs:81
#11 read<wgpu_core::binding_model::BindGroup<wgpu_hal::vulkan::Api>> () at /home/sludge/.cargo/registry/src/index.crates.io-6f17d22bba15001f/wgpu-core-0.20.0/src/registry.rs:142
#12 resolve_compute_command_ids<wgpu_hal::vulkan::Api> () at /home/sludge/.cargo/registry/src/index.crates.io-6f17d22bba15001f/wgpu-core-0.20.0/src/command/compute_command.rs:83
#13 0x00005c2732ac0486 in command_encoder_run_compute_pass_with_unresolved_commands<wgpu_hal::vulkan::Api> () at /home/sludge/.cargo/registry/src/index.crates.io-6f17d22bba15001f/wgpu-core-0.20.0/src/command/compute.rs:313
#14 command_encoder_run_compute_pass<wgpu_hal::vulkan::Api> () at /home/sludge/.cargo/registry/src/index.crates.io-6f17d22bba15001f/wgpu-core-0.20.0/src/command/compute.rs:298
#15 0x00005c2732b67417 in command_encoder_end_compute_pass () at src/backend/wgpu_core.rs:1849

Might be a duplicate of one of the known deadlock issues in https://github.com/gfx-rs/wgpu/issues/5572, I'm not sure yet.

Repro steps Closed source project, so not available.

Expected vs observed behavior No deadlock vs Yes deadlock

Platform Linux, Vulkan. wgpu 0.20.0 is affected (and is where the backtraces are from), but trunk also deadlocks in a similar way.

ErichDonGubler commented 2 months ago

CC @jimblandy, who's been working on issues like this recently.

jimblandy commented 2 months ago

Could you pull out from those stacks the locks each thread is holding, if any?

jimblandy commented 2 months ago

Actually, it might suffice simply to know which lock each thread is trying to acquire, and I could figure out which other ones it must be holding.

SludgePhD commented 1 month ago

The deadlock appears to be caused by:

command_encoder_end_compute_pass acquires the buffer read lock before the bind group read lock here: https://github.com/gfx-rs/wgpu/blob/edf1a86148d1a62da857633fb224aa569f21ce4e/wgpu-core/src/command/compute_command.rs#L82-L83
command_encoder_end_render_pass acquires the bind group read lock before the buffer read lock here: https://github.com/gfx-rs/wgpu/blob/ad6774f7bb9c327238322d9e5beeb1c9a0c6e89d/wgpu-core/src/command/render.rs#L1385-L1389

In the backtraces above, there is one thread in the first location holding the buffers lock and trying to acquire the bind_groups lock, and one thread in the second location holding most locks (including the bind_group one) and trying to acquire the buffers lock.

While these are all RWLocks, and these are all read lock acquisitions, there are also several threads trying to acquire write locks for both the bind_group and buffer storages. Due to the fair RWLock implementation in parking_lot, this makes the attempts to acquire read locks block until the write lock can be acquired, which then completes the deadlock.

It sounds like rank::REGISTRY_STORAGE should be split into one rank per resource to catch mistakes like this, maybe?

gfx-rs / wgpu

Deadlock in 0.20.0 and trunk #5662