IGCIT / Intel-GPU-Community-Issue-Tracker-IGCIT

IGCIT is a Community-driven issue tracker for Intel GPUs.
GNU General Public License v3.0
118 stars 4 forks source link

SPIR-V compiler crash with yuzu ASTC decoder #159

Closed liamwhite closed 1 year ago

liamwhite commented 2 years ago

Checklist [README]

Application [Required]

yuzu

Processor / Processor Number [Required]

Core i5 12600K

Graphic Card [Required]

Arc A770 16GB

GPU Driver Version [Required]

Rendering API [Required]

Windows Build Number [Required]

Other Windows build number

No response

Intel System Support Utility report

ssu-info.txt

Description and steps to reproduce [Required]

The yuzu Nintendo Switch emulator crashes when loading any application with the Vulkan Arc driver due to a crash in SPIR-V shader compiler. The file was produced with glslangValidator and the Vulkan application has no validation errors.

Integrated graphics drivers are not affected. Mesa is not affected. Other driver families (AMD, Nvidia) are not affected.

Here is a link to the issue report: https://github.com/yuzu-emu/yuzu/issues/9072

Here is a link to the lastest version of yuzu: https://github.com/yuzu-emu/yuzu-mainline/releases

Here is a link to a zipped homebrew example game to run in the emulator which can trigger the crash: simple_triangle.zip (free software, compiled from here)


I have created a minimal application which reproduces the crash here: https://github.com/liamwhite/astc-crash/

Steps for the minimal application:

  1. mkdir build
  2. cd build
  3. cmake ..
  4. Open in VS and build main.cpp
  5. Run and it will immediately crash

arc_crash

Device / Platform

No response

Crash dumps [Required]

main.dmp.zip

Application / Windows logs

No response

IntelSupport-Rozilah commented 2 years ago

hi @liamwhite, Thank you for reaching out about this issue, can you specify the published game/app that you used in Yuzu that made the issue happen so that easy to narrow down the issue?

liamwhite commented 2 years ago

Hi @IntelSupport-Rozilah ,

The crash can be reproduced with any game. I have provided an example zipped homebrew game in the issue report which you can use for testing. Ensure you are using the Vulkan backend to reproduce the crash.

Alternatively, compile and run the minimal example I provided here, which does not require the emulator to reproduce: https://github.com/liamwhite/astc-crash

IntelSupport-Rozilah commented 2 years ago

hi @liamwhite,

Thanks for the reply, we have verified the issue with your app zipped homebrew game and it is reproducible but can you specify as well the published game/app that you used, so that we can escalate to next level?

liamwhite commented 2 years ago

@IntelSupport-Rozilah

I initially tested and reproduced the crash with the game Super Mario 3D World + Bowser's Fury.

For your internal prioritization, however, here is a list of every game I have tested that is affected by this crash:

(Since every single game is affected by this issue, there would be thousands in a complete list.)

IntelSupport-Rozilah commented 2 years ago

hi @liamwhite , thanks for your reply, we will start to verify the published game once it is ready and will revert back to you soon. Stay tuned!

IntelSupport-Rozilah commented 2 years ago

hi @liamwhite, The case has been verified on our side and we are able to reproduce it with our configurations as below: Processor: ALD-S Graphic Driver version: 31.0.101.3490 We will be working on a fix. I want to give you a heads-up that the fix may take 3 to 6 months to be included.

liamwhite commented 2 years ago

Okay. I will add a workaround to yuzu for the issue for now, then.

goldenx86 commented 1 year ago

This issue affects more than just the ASTC compute shader we use. Basically, all games are prone to random SPIR-V compiler crashes, it's of very high priority for us that this is looked into.

Valkirie commented 1 year ago

@IntelSupport-Rozilah hey, just checking if this one is in the pipeline.

lucianobraga84 commented 1 year ago

Hey, Intel. Every Vulkan game crashes sometime because of this unacceptable problem. ARC is supposed to perform better with new generation APIs like DirectX 12 and VULKAN!

Can't believe you won't even tell us if it this SPIR-V fix is in the pipeline.

(OpenGL on ARC is bad too, but...who cares?)

kunit1 commented 1 year ago

Hello @Karen-Intel would you be able to provide an update on this issue? It's a serious bug in the intel Vulkan driver and it's been verified by intel many months ago. Any update or prioritization would be appreciated.

liamwhite commented 1 year ago

If I take the original compute shader, and comment out two of the case statements in this switch-case block, it is able to compile the shader.

image

In fact, it seems like the compiler cannot cope with having six or more case labels in this particular switch block for whatever reason (doesn't really matter what they are, you just have to have at most five). This also applies when rewriting it as an if-elseif chain.

I found this by running a testcase reduction, which produced a file so fragile that changing almost anything in it causes the compiler to stop crashing: astc_decoder.zip. What's going on? Is the internal state of the compiler getting corrupted?

There haven't been any updates for ~6 months, so I would expect to at least see a report of some progress on the issue.

kunit1 commented 1 year ago

@Arturo-Intel are you able to help here? Thanks.

Karen-Intel commented 1 year ago

Hi all. EstebanIntel will be providing an update in this case soon

Karen

EstebanIntel commented 1 year ago

Hi @liamwhite & @kunit1,

Thank you for reporting this issue. Our priority is to target the most popular games and apps to focus our efforts on providing a high quality, stable experience for the broadest set of users. We will continue to improve our software performance and compatibility throughout 2023 and beyond. While we can’t accommodate your request at this time, please watch this article on our website for any possible changes to this situation.

goldenx86 commented 1 year ago

What a joke of a company. Fix RTX Remix and Portal RTX then, you still haven't noticed your driver crashes for the same reason there.

I will make this public in our next progress report.

EstebanIntel commented 1 year ago

Hi @goldenx86,

Have you reported the issues on Portal RTX? I couldn't find any report on it in this platform.

Our driver validation process does include several games. But given the number of PC games that exist, we cannot test every single one for every driver version. This is were you and other users formally reporting an issue is really helpful, so we can be made aware that the issue exists and look into it.

goldenx86 commented 1 year ago

And a report including a test case, and even pointing where the issue is doesn't merit 5 minutes, even after almost 7 months? I won't be reporting you anything else, it's clear driver quality is not your concern at all.

goldenx86 commented 1 year ago

I strongly recommend you re-structure how you deal with driver reports, your competition does way better.

lucianobraga84 commented 1 year ago

It's simple: it is broken, you know what the problem is, so you fix it. And since RTX Remix will be commonly used, you would be fixing AHEAD.

But "thank you, now we know but we won't fix" is...

Wait, it's Intel.

DarrenWarrenV commented 1 year ago

@IntelSupport-Rozilah Can you fix this after what 7-8 months? I am your customer and I demand better support with fixing/resolving driver issues. If not, I'll never buy a laptop with Intel CPU again, also I will tell everyone I know how you completely suck and how lazy you are. I will convince them to choose AMD.

BuyMyMojo commented 1 year ago

any update on this?

aviot commented 1 year ago

It's 5/11/2023 now, will fix it at the next update?

nipkownix commented 1 year ago

Classic Intel.

goldenx86 commented 1 year ago

The Indie Developer Support team at Intel contacted me directly; the issue is being looked at.

peigongdsd commented 1 year ago

Any update on this?

sherief commented 1 year ago

The Indie Developer Support team at Intel contacted me directly; the issue is being looked at.

You're welcome.

refractionpcsx2 commented 1 year ago

Maybe intel isn't thinking about fixing intel, because intel isn't very popular right now, especially on this issue 😎

RexSonic commented 1 year ago

Intel support team hard at work

Karen-Intel commented 1 year ago

Hey guys, just a quick update.

Just to clarify this situation, the help you need is quite clear and we have been working backstage to get traction on this. We have direct comms with @goldenx86 and working alongside with him and his team already. We're making quite a good progress and I'm positive we will have updates.

Intel support team hard at work

Also, be sure we appreciate all the comments and we read you, YOU help us improve.
I'm sure I'll be talking to you all soon (We also moderate the Intel Game Dev Support Forum so if you are a dev in need of help you're welcome to post your case here or in the forum.

Karen

RexSonic commented 1 year ago

I’ll believe you once I see something actually delivered

Karen-Intel commented 1 year ago

I’ll believe you once I see something actually delivered

Fair enough :) Keep you posted

B-Reif commented 1 year ago

If someone submits a detailed, reproducible driver error, with an executable example attached, why do you need to know about "the published game/app that you used" to actually fix the problem with your software?

BelleNottelling commented 1 year ago

Hello, @Karen-Intel. I have a few questions that I certainly would love some honest answers to, and I suspect everyone else following this thread would also appreciate some answers.

We have direct comms with @goldenx86 and working alongside with him and his team already.

Was the comment by @EstebanIntel simply because of poor internal communication, or was this the actual answer two weeks ago? Truthfully, your comment appears to be as of a direct result of the comments that were made in the yuzu progress report, where the post rightfully bashes on the way this bug report has been handled by stating things like this:

This is some kind of twisted joke. For comparison, when you do this with NVIDIA, they hire you.

And showing a screenshot of broken rendering with this caption:

That’s what happens when we’re forced to remove an entire pipeline stage, Intel

That progress report was posted two days ago. There is zero mention of anybody on Intel's team being in contact with @goldenx86 and the tone and wording of everything on the page suggests at the time the yuzu team had absolutely no reason to believe that even an ounce of effort was being put into the problem.

This to me clearly indicates that having "direct comms" started AFTER the report was posted and made public, which suggest that trying to fix the issue is less that Intel actually wants to fix the issue and more that they just want to deal with the negativity surrounding it.

I can absolutely understand not wanting to focus debugging, engineering, and driver development time to a game that simply doesn't have any a large user-base, but to me this situation feels quite different from those for a few reasons.

  1. Yuzu is quite popular. It has 25k stars on Github and nearly 10 million downloads. a. The yuzu installer has 9,990,184 downloads.
  2. Unlike most reports you'll come across, this actually should be really easy to reproduce and debug. a. The original posted gave a GitHub repository with source code that could be used to easily reproduce the issue. This means Intel wouldn't have needed to go buy an obscure game then spend potentially hours trying to find a way to easily replicate the issue. Just clone that repository and compile it. b. Having readily available access to source code that can be used to reproduce this crash should actually help ease the troubleshooting step on your end. You'd be able to see exactly what the code is doing to cause the crash and even modify the application's behavior if needed. That also opens up the ability to use debugging utilities with the application.
  3. It's Vulkan! I know that Arc cards are supposed to struggle with older versions of DX and I know those are a low priority. That's fine and to me the expectation for these GPUs, but Vulkan and DX12 are supposed to be the graphics APIs that Arc GPUs handle well. Why would you not want to fix issues an issue that significantly hinders the GPUs capabilities to handle something that it is supposed to be really good at?

I have no ill-will towards you or other's who work at Intel. Creating a graphics card and providing support for it is far from a simple task, but a lot of the driver blunders that we see just feel foolish to an end user and it's hard to understand why they are even such big issues? I can't help but wonder if maybe the driver development team is just not big enough to keep up with the very significant number of issues that currently exist? If that is the case, what does will that mean for future Arc GPUs?

I feel like I keep seeing issues like this where there are tasks and workloads that Arc GPUs should absolutely excel at, but instead they are often incapable of doing those things. For example:

  1. Topaz Gigapixel AI, which people have reported as completely broken and unusable on Intel Arc cards, despite coming in a bundle for a lot of people when they buy the graphics card! a. Here's a post of the Topaz Forums on it b. And a few on Reddit: Post 1, Post 2
  2. Topaz Video Enhance AI. a. Once again, The Arc GPUs should be quite capable when it comes to AI related tasks, but Topaz Video AI is completely broken for people. They say that you are working on it with them, but then they also have come back and stated that
  3. Video capturing and encode / decode. a. These GPUs have killer encoding and decoding hardware on them, but right now it's hardly functional. Trying to record something using Arc Control is pretty much just a giant waste of time, AV1 encoding causes visual artifacts to be added to the video. There's some kind of encode / decode issue in the diver that's effecting some video editors and from what I can tell also appear to be related to the Topaz Video AI issue.

It's just.. sad. There are a lot of ways where these cards can easily outshine AMDs GPUs, but then they completely struggle to achieve those things because of the broken drivers.

Are these issues lasting so long because there aren't enough people on the team? Or.. does Intel just not really care about that their GPUs can't even be stable and reliable when trying to perform tasks and take advantage of the features that make them actually stand out compared to the other options?

I'm fully expecting that if you respond to this, I will be told that those additional features are a lower priority than popular games. And I get that. I do. But, there's a problem when a product has selling points that are completely unusable because it's software is broken and it's just not a big priority. And if that is actually the case, then I really do think it would be good to bring on more people to the team.

I apologize for the long-winded comment, but the driver situations are frustrating. I bought an Arc GPU because I want Intel to succeed. I want there to properly be a 3rd competitor to AMD and NVIDIA and I would love to see the team at Intel releasing products that offer great value and some really unique features and selling points. But.. it's hard to envision that future and not feel pessimistic when seemingly baseline functionality is broken in the drivers & being written off as a low-priority.

sherief commented 1 year ago

@BelleNottelling I have it on good authority (a birdie told me) that direct comms started after an email thread referencing this tweet started internally at Intel: https://twitter.com/SheriefFYI/status/1656689822759792640

Edit: Check the timestamps between the tweet and the mention of the Indie Support Team's contact.

BelleNottelling commented 1 year ago

@BelleNottelling I have it on good authority (a birdie told me) that direct comms started after an email thread referencing this tweet started internally at Intel: https://twitter.com/SheriefFYI/status/1656689822759792640

Edit: Check the timestamps between the tweet and the mention of the Indie Support Team's contact.

Interesting. Thank you for sharing that insight. I hope that just means someone important at Intel saw the tweet and realized that there's issues they need to resolve, but at the same time it's hard to not feel like they are just responding to the negativity around it rather than actually being concerned about the quality and stability of the drivers.

sherief commented 1 year ago

I share your sentiment.

Karen-Intel commented 1 year ago

Hey @BelleNottelling A few insights on your questions. Thank you for elaborating on many levels

  1. Arturo and I are the current GitHub moderators, that's why you see us very active here and that's why this case got on our plate. With Esteban and the rest of the team, we raise our hands and work together when there's anything needed.

  2. We contacted the Yuzu team about 2 weeks ago, which means we were actually working in this prior to the Twitter status. Like many of you, we also work on teams that have lots of things to do on queue, so this -response from the Indie Dev team, which is us- had been cooking for a while now and @goldenx86 would not let me lie. My peer Arturo has been in touch with him.

  3. About every existing issue regarding ARC and Intel GFX Products -that has been posted here or in any other forum/, has a case that we make sure to report. However, every case has to be analyzed by different teams that have been affected -like us- for many internal things you may be already aware of + the lifecycle of every bug that is reported on every company like ours.

  4. So yes, we need help, that's why sometimes we go back and forth, as we don't have 6 hands (tho I'd like to have them, believe me). Like you, we know the potential our products have to rock, however we are maturing them and it wouldn't be possible without your help. We know this situation won't last forever and like you, we're positive the awkward times will pass before we notice.

  5. Now about the evidences we already had, which helped us build a solid case, we needed to focus on the compute shader that is not working as expected. Again, justification is key. We are also developers and that's why our engagement in this case was crucial. We are pretty much bringing another perspective that hope will be helpful and eventful for everyone in this thread.

I think I have covered at least a bit of your concerns, I heartedly hope this brings a little peace of mind to whoever needs it. Again, we are here to help, especially -developers in distress-. Talk to you soon

Sincerely Karen

BelleNottelling commented 1 year ago

Hi, @Karen-Intel I appreciate the quick response, but I don't think you've fully answered everything I was wondering about. Namely:

Once again, I wish no ill-will to you and the other's on your team. I really do appreciate the quick and seemingly honest response. Thank you for your time and for responding to my questions

Thank you, Belle.

goldenx86 commented 1 year ago

I can confirm what @Karen-Intel is saying, I was contacted by Arturo while working in yuzu's progress report (and I wouldn't change my tone there without results on hand), still waiting an official response. We explained the issue, its ramifications, and our discomfort with the current report system to him. After that, it's closed doors from the dev teams, it's not something I can't comment on. Next contact was yesterday, confirming that they managed to replicate all issues and are working on it.

What I can say is that the community as a whole is in a negative mood lately, hardware companies have been more disconnected from reality than ever before, so situations like this make all of us, users and consumers, very frustrated.

NVIDIA lying about the GPU shortage and how much mining affected it, then releasing a product series with negative value. Asus blowing up users' computers, trying to cover the problem up and not providing any warranty. Intel reinventing the wheel with Arc's driver, and the decisions taken that you can see here. The price of entry for Zen4 systems. Inflation affecting many previously stable markets.

The community is tired, angry. This is the moment to re-establish good will and trust with the hardware community, solve the mistakes of the past. Hopefully fixing this issue helps start a trend on the whole industry.

We're open to provide any more information on the issue and help with any testing required. We know what it feels to be short-staffed.

sherief commented 1 year ago

Then I owe an apology over my earlier statement about the reason, though I can still confirm I'm aware of at least two email threads circulating within the Intel graphics group referencing the tweet.

peigongdsd commented 1 year ago

Looks like ryujinx is having the same issue too.

https://github.com/Ryujinx/Ryujinx/issues/4922#issue-1708376119

Maybe they're too busy to look into this issue. Hope fixing the SPIR-V compiler will finally save the both emulators.

INTEL you'd better understand the feeling of a player who bought your ARC A770 16G only because of the trust in your hardware and your advertisement about how ARC series is good for vulkan gaming only to find that he CANNOT EVEN ENTER AN EMULATED GAME.

Yuzu and ryujinx both have a large user base. I hope you be really serious about this.

Tiancxz commented 1 year ago

It's 5/25/2023 now, will fix it at the next update?i cant wait.I am a decade old fan of Intel, and I hope Intel can fix this bug as soon as possible

Tiancxz commented 1 year ago

I am so anxious, intel developers can not add fuel, so down the road are not going to buy the arc. Eveyone says AMD's driver garbage, I did not think intel would be so. I hope the developers of intel will pay attention to this problem

EstebanIntel commented 1 year ago

Hi all,

We heard you load and clear, and a fix for this issue is currently been worked on by our driver developers. However, I cannot promise the fix will be available in the next driver update, as it is still in the works. I will post an update when the fixed in merged to a public driver.

Tiancxz commented 1 year ago

EstebanIntel

Dear EstebanIntel As you know, this problem appeared a long time ago last year and the driver developers never fixed it. Maybe this bug is very hard to fix, I am not accusing them, but I am saying that fixing the bug should take priority over optimizing a certain popular game. Because this kind of vicious bug can cause players to be directly unable to play. So I hope you can tell them to prioritize this kind of problem rather than keep putting it off. Thank you for your quick reply to our question!

BuyMyMojo commented 1 year ago

I’d assume there would be different teams working on optimisation and bug fixes, I am glad to hear they are working on it either wayOn 26 May 2023, at 15:04, Tiancxz @.***> wrote:

EstebanIntel

Dear EstebanIntel As you know, this problem appeared a long time ago last year and the driver developers never fixed it. Maybe this bug is very hard to fix, I am not accusing them, but I am saying that fixing the bug should take priority over optimizing a certain popular game. Because this kind of vicious bug can cause players to be directly unable to play. So I hope you can tell them to prioritize this kind of problem rather than keep putting it off. Thank you for your quick reply to our question!

—Reply to this email directly, view it on GitHub, or unsubscribe.You are receiving this because you commented.Message ID: @.***>

EstebanIntel commented 1 year ago

Hi all,

We have just posted driver version 31.0.101.4382, which has a possible fix for this issue. Can you all test it out?

liamwhite commented 1 year ago

I can confirm that the crash reported in this issue is fixed in 31.0.101.4382. Thank you. Should I close this now or wait for it to hit stable?

EstebanIntel commented 1 year ago

Thanks for the quick confirmation @liamwhite!

Regarding our public Beta drivers, they are just as stable as the WHQL ones. The only difference is that the public Beta ones don't go through Microsoft's Windows Hardware Lab Kit (HLK) testing, and thus, can't be distributed via Windows Update. But both branches go through the same internal validation process within Intel.