gfx-rs / wgpu

A cross-platform, safe, pure-Rust graphics API.
https://wgpu.rs
Apache License 2.0
12.33k stars 904 forks source link

[Metal/Intel] Shadow example rendering glitch #103

Open jasondavies opened 5 years ago

jasondavies commented 5 years ago

macOS 10.14.3 MacBook Pro (Retina, Mid 2012) NVIDIA GeForce GT 650M 1024 MB rustc 1.35.0-nightly (e68bf8ae1 2019-03-11) Latest wgpu master at time of writing (9f70c2e59ffcee99d6a47d431094e7486076f3c0)

cargo run --release --bin shadow --features=metal

I'd be happy to dig deeper but I'm not sure of the best place to start.

Screenshot 2019-03-12 at 20 28 26
kvark commented 5 years ago

Thanks for filing! On the first sight, it looks like a driver bug. The way to investigate this would be to create an XCode project (with "external build system" type), just so that you can start the application from XCode and force-enable Metal validation. If no validation errors show up, then we need to take an XCode GPU capture and analyze it. Needless to say, I'm not seeing the problem on Intel, I'll check on AMD as well.

jasondavies commented 5 years ago

Thanks. I did as you suggested and ran with Metal validation enabled (this was on by default) but nothing appeared in the log.

If it is a driver issue I assume there's not much I can do about it? Side-note: I have experimented with MoltenVK in the past but I'm not sure if this would affect anything.

Running a GPU capture, the issue seems to crop up right near the last call; the call just before this one looks fine (see below):

Screenshot 2019-03-12 at 20 54 08
kvark commented 5 years ago

If metal validation finds something, you get a crash/breakpoint when running from XCode, so it's difficult to ignore :)

If we confirm this to be a driver issue, we can submit a bug to Apple. I've had some success in them addressing the issues.

Running a GPU capture, the issue seems to crop up right near the last call; the call just before this one looks fine (see below):

That is interesting. The artifacts are visible on the result of more than one draw call.

Could you save the capture to disk and send it over to me by, say, Firefox Send? Please share the link in our gitter where we generally discuss things.

jasondavies commented 5 years ago

In case it's relevant: under System Report the NVIDIA GeForce GT 650M shows Metal: Supported, feature set macOS GPUFamily1 v4.

kvark commented 5 years ago

Filed bug 48911699 on Apple radar for this.

parasyte commented 5 years ago

FWIW, I see the same behavior on macOS with the Intel chipset.

Intel Iris Pro:

  Chipset Model:    Intel Iris Pro
  Type: GPU
  Bus:  Built-In
  VRAM (Dynamic, Max):  1536 MB
  Vendor:   Intel
  Device ID:    0x0d26
  Revision ID:  0x0008
  Metal:    Supported, feature set macOS GPUFamily1 v4
  Displays:
Color LCD:
  Display Type: Built-In Retina LCD
  Resolution:   2880 x 1800 Retina
  Framebuffer Depth:    24-Bit Color (ARGB8888)
  Main Display: Yes
  Mirror:   Off
  Online:   Yes
  Rotation: Supported
  Automatically Adjust Brightness:  No
seivan commented 5 years ago

Yeah it happens on the motherboard GPU (Intel) not with the dedicated GPU, at least not with @jasondavies gpu (I got the same) The example runs with PowerPreference::LowPower which means it will probably run with the Intel one at least on 9f70c2e

So mention of NVIDIA GeForce GT 650M 1024 MB is wrong.

parasyte commented 5 years ago

AFAICT, this example was broken on dedicated GPUs (Radeon Pro 560 4 GB in my case; patching framework.rs to use PowerPreference::HighPerformance) by https://github.com/gfx-rs/wgpu-rs/pull/4 which suggests the culprit is somewhere in this range: https://github.com/gfx-rs/wgpu/compare/0edf927...dd61d12

When I bisected this, the bad commit was https://github.com/gfx-rs/wgpu/pull/172

I haven't tried to investigate further. It looks like something in gfx-hal 0.1 -> 0.2 broke it, or possibly gfx-backend-metal 0.1 -> 0.2

LowPower HighPerformance
cd9b7b8 cd9b7b8 LowPower cd9b7b8 HighPerformance
965d242 965d242 LowPower 965d242 HighPerformance

More info: This is running on a different laptop from the one I reported earlier. Here are its specs:

Intel HD Graphics 630:

  Chipset Model:    Intel HD Graphics 630
  Type: GPU
  Bus:  Built-In
  VRAM (Dynamic, Max):  1536 MB
  Vendor:   Intel
  Device ID:    0x591b
  Revision ID:  0x0004
  Automatic Graphics Switching: Supported
  gMux Version: 4.0.29 [3.2.8]
  Metal:    Supported, feature set macOS GPUFamily2 v1
Radeon Pro 560:

  Chipset Model:    Radeon Pro 560
  Type: GPU
  Bus:  PCIe
  PCIe Lane Width:  x8
  VRAM (Total): 4 GB
  Vendor:   AMD (0x1002)
  Device ID:    0x67ef
  Revision ID:  0x00c0
  ROM Revision: 113-C980AJ-927
  VBIOS Version:    113-C9801AU-A02
  EFI Driver Version:   01.A0.927
  Automatic Graphics Switching: Supported
  gMux Version: 4.0.29 [3.2.8]
  Metal:    Supported, feature set macOS GPUFamily2 v1
  Displays:
Color LCD:
  Display Type: Built-In Retina LCD
  Resolution:   2880 x 1800 Retina
  Framebuffer Depth:    30-Bit Color (ARGB2101010)
  Main Display: Yes
  Mirror:   Off
  Online:   Yes
  Rotation: Supported
  Automatically Adjust Brightness:  No
kvark commented 5 years ago

@parasyte thank you for the data! I'm quite puzzled here. Looking at a GPU capture on Intel, I see that rendering into layers of the shadow is totally correct, but the texture looks wrong when it's being sampled from.

End of rendering (to layer 0): shadow-depth0 Sampling from layer 0:

Screen Shot 2019-09-02 at 21 35 34

My guess is that Apple runtime and/or driver doesn't insert proper transition barriers for the case of rendering being done to layers of a 2D array depth texture. I'd expect this to be a responsibility of the driver, in which case having it wrong in multiple IHVs is strange. Perhaps, it's Metal runtime that fails?

@litherum fyi, this was filed on the radar as 48911699 (also got assigned FB5398663) for NVidia, but it appears to affect other vendors as well. Please let us know if there is anything we can provide. Reproducing today is easy:

git clone https://github.com/gfx-rs/wgpu-rs
cd wgpu-rs
cargo run --example shadow
tomgreen66 commented 3 years ago

Whats the latest with this issue? On latest of Big Sur and using Intel Iris Pro graphics and it seems to have a similar rendering bug. Were any fixes applied by Apple?

     Running unittests (target/debug/examples/shadow-2a69e77289900e7b)

running 1 test
test shadow ... FAILED

failures:

---- shadow stdout ----
thread 'shadow' panicked at 'Image data mismatch! Outlier count 1271063 over limit 500. Max difference 192', wgpu/examples/shadow/../../tests/common/image.rs:134:13
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
thread 'shadow' panicked at 'UNEXPECTED TEST FAILURE', wgpu/examples/shadow/../../tests/common/mod.rs:301:9

failures:
    shadow

test result: FAILED. 0 passed; 1 failed; 0 ignored; 0 measured; 0 filtered out; finished in 5.84s

image

kvark commented 3 years ago

@tomgreen66 I filed this issue years ago, and it got resolved by Apple. But apparently their fix only starts kicking in on a second frame, while the testing infrastructure has the first frame, which shows the same issue... So it's still a bug on Apple, but we need to report it again. And technically it's harmlesss.