gfx-rs / wgpu

A cross-platform, safe, pure-Rust graphics API.
https://wgpu.rs
Apache License 2.0
12.14k stars 886 forks source link

Took more than 10minutes to create compute pipeline on Linux #6110

Open GopherJ opened 4 weeks ago

GopherJ commented 4 weeks ago

Description creating compute pipeline becomes super slow once some changes have been made to the shader

Repro steps Ideally, a runnable example we can check out. Expected vs observed behavior faster

Extra materials N/A

Platform Linux Mint based on ubuntu 22.04, i7-13gen, 64GB, GTX4090

ErichDonGubler commented 4 weeks ago

@GopherJ: This issue does not actually provide instructions for running an example of the behavior described. I don't think it's reasonable to expect contributors here to build and run binaries that are not trivial to validate for safety and security without instructions, so: Unless you can make your example small (i.e., everything in a pair of Cargo.toml and main.rs), determine what changes specifically are causing this behavior (perhaps as a diff), and quantify what "slow" means, I intend to close this issue.

GopherJ commented 4 weeks ago

hi @ErichDonGubler it's more a problem of running wgpu on linux. If more context is needed I can try to make one later.

And it's not related to my code because the bottle neck is on create_compute_pipeline API

Wumpf commented 4 weeks ago

our ci regularly runs create_compute_pipeline with various inputs on all platforms without this issue, so it clearly is related to either the sample at hand or your setup. So yes, more context is needed!

GopherJ commented 4 weeks ago

I'll try to provide a reproducible example later.

cwfitzgerald commented 4 weeks ago

It would be useful to know the timing of the underlying vkCreateComputePipelines call is taking, as I suspect that the hang is entirely within that call.

GopherJ commented 3 weeks ago

https://github.com/GopherJ/webgpu-shaders

@ErichDonGubler @cwfitzgerald here the reproduce repro

if you try to run:

cargo test --release --no-default-features --features std -- --test-threads 1 --nocapture

it basically hangs for a while to compile OR create compute pipeline, on macos I don't observe, things are super fast

Environment: image

GopherJ commented 2 weeks ago

any idea?

ErichDonGubler commented 2 weeks ago

I don't have a Linux environment handy at the moment, so I can't directly contribute debugging hre. Inspecting the reproducible example repro, however, I suspect there is still some work to do to make the example smaller:

  1. If the issue is creating a compute pipeline, not executing one, we shouldn't need to keep any of the code past the stage of creating a compute pipeline. Things like constructing and executing compute passes should be unnecessary.
  2. I see multiple compute pipelines being created, but this bug has only mentioned a specific pipeline being slow. We should be able to limit the reproducible example to code that creates only that pipeline.
  3. Your cargo test reproduction steps don't note which #[test] entries are slow. Is it all of them? Only some of them? We should narrow the reproduction steps down to only one of these cases, if possible.
jimblandy commented 2 weeks ago

The Mozilla folks aren't going to be prioritizing this for the moment. As always, others are welcome to investigate.