PCSX2 / pcsx2

PCSX2 - The Playstation 2 Emulator
https://pcsx2.net
GNU General Public License v3.0
11.52k stars 1.6k forks source link

Meta: AMD Issues/Workarounds #1552

Closed mirh closed 3 years ago

mirh commented 8 years ago

Follows #1508 and hrydgard/ppsspp#8698 Gs dump is here.

Bissected up to either 16c2baa0df2d7859619d51d3995b78f057a8e965 or 29c97a9bf21a985e1524e0b428ff97aa678adcc4 Happens in Ace Combat 5 after "press start" screen only after Blending Unit Accuracy has been set to none in OGL hw.

I'd just complain over at AMD but I'd like to get a more straightforward testcase for them. Wouldn't be bad if somebody added "Upstream | External" label

List of AMD issues:

Links to AMD forum issue threads: https://community.amd.com/message/2748362 https://community.amd.com/message/2756964

Possible BSOD Citra Workaround Merged from issue #2362 As Gregory requested so we don't forget about it.

Currently Citra added a workaround for the amdfail driver that fixes the crashing caused by SSO. The commit is located here https://github.com/citra-emu/citra/pull/3499/commits/0cf6793622b01f3941fbc77fe04c3b68476004ca

Reddit post: https://www.reddit.com/r/emulation/comments/88vva4/citra_on_twitter_new_update_to_the_hardware/

Idea would be for this to be checked out and maybe implemented.

Some useful info

You unbind everything so you pay extra invalidation What we need to do is Create a pipeline by shader combination So you only bind once stage to a pipeline And then we bind and rebind pipeline But I'm not sure it will fix the crash Potentially citra workaround might not work on ours side.

gregory38 commented 8 years ago

There is a way to dump blending setup

https://github.com/PCSX2/pcsx2/blob/master/plugins/GSdx/GSRendererOGL.cpp#L675

Replace this line with if 1

Normally tracing can be enabled with the debug_opengl = 1 on dev/dbg build. As the bonus it will validate openGL functions call.

However, I'm not sure tracing work on windows, @FlatOutPS2 @ssakash @turtleli did one of you manage to make openGL tracing usable on windows?

turtleli commented 8 years ago

I fixed it last year (cbd2417833104d3de42f3ca69ce038a1ffc52fb6), I haven't used the tracing stuff since January/February though so I'm not aware of the current state (I assume it still works).

Nucleoprotein commented 8 years ago

@mirh You can use BlueScreenView to get basic info about crash like bugcheck code etc.

lightningterror commented 8 years ago

There is a similar issue with GT4. HW OpenGL , Blending Unit Accuracy set to none.

I didn't get a bsod but the display driver stops working.

There's all kinds of artifacts on the screen. The screen flickers and the display driver is restarted. After the display driver restarts pcsx2 doesn't respond and it needs to be closed from processes. GPU load stays at 100% even after closing pcsx2. The only solution to fix the gpu load is a system restart. An event log is written "Display driver amdkmdap stopped responding and has successfully recovered."

Nucleoprotein commented 8 years ago

@lightningterror This seems to be exactly as my problem with Star Ocean First Departure and some other games in PPSSPP, maybe both triggers same code in AMD driver.

EDIT: Can you check this by unpacking this file to PCSX2 directory and testing again ? (this is old AMD OpenGL driver from 15.7 driver package) http://www36.zippyshare.com/v/0NmGmAOF/file.html

gregory38 commented 8 years ago

So they fixed the SSO/dual blending issue. Spent 6 months of tests. Finally release it, and boom first test explodes the computer.

Did someone open a report to AMD. Saying that using the dual blending unit crash the whole systems.

@mirh By the way the commit that you found just disable accurate blending on some equations. Initially Cs*As + Cd, Cs * F + Cd, Cd - Cs*As, Cd - Cs*F were always partially done in software. I.e the multiplication was done in SW and the addition/subtraction in HW. For example Cs*As + Cd. Before shader output Cs*As and blending unit was set to Cfrag + Cd Now shader output Cs and blending unit was set to Cfag * Afrag + Cd (note Afrag comes from the 2nd source).

mirh commented 8 years ago

Did someone open a report to AMD. Saying that using the dual blending unit crash the whole systems.

I first wanted to have a trace they could use to reproduce the issue before opening a report, but I have no time atm.

gregory38 commented 8 years ago

Might not be easy to have a trace. Did you try to replay your gs dump ?

Anyway, they will need 6 months to release a fix (potentially it is already fixed......).

mirh commented 8 years ago

Did you try to replay your gs dump ?

I don't know how I can do it :p

Nucleoprotein commented 8 years ago

@gregory38 This crash seem not to be related with SSO at all, it's something else because PPSSPP does not use SSO and crashes in same way.

gregory38 commented 8 years ago

I agree with you the bug is related to dual-souce blending. However they fix their codes to support dual-source blending with SSO. There is a huge probability that they introduce another bug/regression in the meantime.

Actually, with the bisected commit of mirh, you can be sure the issue is dual-source blending. Because the commit replaces some blending operation with single-source blending (old code) by dual-source blending (new code) when you disable accurate blending. The goal was to reduce the load on the GPU.

mirh commented 8 years ago

If only I could have a testcase.. šŸ˜ Does the "gs dump player" (whatever it is and whatever it works) require BIOS? EDIT: uh, MFW I find tools/GSDumpGUI folder. Inb4 I'll be the first guy happy for a BSOD.

gregory38 commented 8 years ago

Player is only an exe that load GSdx.so file. So no bios. Technically the gs dump contains game textures & vertex. But I think a couple of frames can be seen as a fair-use. Honestly, I'm not even sure you need a testcase, you can reports that several projects are broken. Maybe someone will be clever enough to detect that test quality on dual source is bad.

@FlatOutPS2 how do you replay on windows ? How do you update the ini option, is it possible actually ?

mirh commented 8 years ago

Yes, yes, I just tested it. I guess it it will be quite fine.

Reported.

Honestly, I'm not even sure you need a testcase, you can reports that several projects are broken

It's just I was thinking that if we needed 7 months for something with sources and all, a dumb "closed" test would have been even less useful.

lightningterror commented 8 years ago

I can't really open the thread to check the report. Says access is restricted. Guess I can't see the staff comments on this.

mirh commented 8 years ago

They still have to approve it prolly.

Dokman commented 8 years ago

hey i was testing it with looney tunes space race and , butin my case with a R9 290X and driver version 16.7.3 beta and i don't have this bug

mirh commented 8 years ago

Try my testcase, then report back. Of course not every game performs the same calls.

lightningterror commented 8 years ago

@mirh I tried it and the issues are the same as with GT4. Now we wait 6-7 months for a fix.

gregory38 commented 8 years ago

No 6-7 month delay is only to deliver the fix. They first need to find the bug and then a solution. At least you can use accurate blending to reduce the crash likelyhood.

mirh commented 8 years ago

OT, but w/e: just for the records, since a month CodeXL support cross-platform frame analysis (aka see which functions are spending the most CPU or GPU time)

Nucleoprotein commented 8 years ago

@mirh Tested your bsod package you posted on AMD site - TDR looks same like in PPSSPP, so this seems to be same issue.

lightningterror commented 7 years ago

"I can confirm we determined this to be a driver issue. Our GL driver team is now working on a fix."

At least they are working on a fix.

lightningterror commented 7 years ago

Amd fixed this. It will be available in the newest drivers.

FlatOutPS2 commented 7 years ago

Amd fixed this. It will be available in the newest drivers.

Great, now all we have to do is wait 3 months. :p

Nucleoprotein commented 7 years ago

Great, now all we have to do is wait 3 months. :p

@dwitczak from AMD:

We have fixed this issue internally. The bug should no longer reproduce in the next driver release, or the one that follows.

Seems so, or even longer...

gregory38 commented 7 years ago

Technically, you are at least sure that it will be integrated in the last release of an AMD driver (because none will follow) :stuck_out_tongue: The one that can guess the release version that will include the fix get the privilege to report next issue ;)

mirh commented 7 years ago

As I said in the Vulkan rpcs3 issue, they said _the_ next release. Not _a_ future release. Or perhaps i'm just overanalyzing I dunno.

FlatOutPS2 commented 7 years ago

As I said in the Vulkan rpcs3 issue, they said the next release. Not a future release. Or perhaps i'm just overanalyzing I dunno.

The quote is "The bug _should_ no longer reproduce in the next driver release, _or_ the one that follows.". Those two highlighted words give some room for it to be postponed.

mirh commented 7 years ago

Next release or that one that follows, kay. Which is like a week or two.

That _should_ on the other hand may just express "courtesy" then.

avih commented 7 years ago

So right now they have 16.9.2 as the official release and 16.11.3 as the less official one (non-whql). What would "next" or "the one which follows" be? 16.11.4? 17.x?

mirh commented 7 years ago

Any one should just count I believe. Also, I think there's no distinction between official and beta release, as long as build number increases.

trivia: non-whql isn't a thing anymore if you want your driver to work with W10 anniversary.

FlatOutPS2 commented 7 years ago

So right now they have 16.9.2 as the official release and 16.11.3 as the less official one (non-whql). What would "next" or "the one which follows" be? 16.11.4? 17.x?

They don't count the hotfixes as new driver releases. Unless they streamlined the process very recently, it'll take a couple of months before we see this fix. We just need to hope they don't introduce another bug in the meantime...

Nucleoprotein commented 7 years ago

DX12 POPCNT fix was added in 16.10.2 hotfix so they can add OpenGL fix in hotfix too. I think GL_ARB_separate_shader_objects fix also was added at first in hotfix.

FlatOutPS2 commented 7 years ago

They can add all kinds of fixes to hotfix releases, but when they say the next driver release, they don't mean it will be included in the next hotfix release.

avih commented 7 years ago

So instead of what not, can anyone say which version they mean by "next"?

mirh commented 7 years ago

It's a surprise for everybody šŸ˜ƒ

avih commented 7 years ago

In that case, my guess is that "next" refers to 17.x.x and "the one which follows" would be 18..x.x . How many years did it take them to move from 15 to 16?

Nucleoprotein commented 7 years ago

Thiers driver names are the date, ie 16.11 mean November 2016, so 18.xx will be in 2018.

gregory38 commented 7 years ago
 I think GL_ARB_separate_shader_objects fix also was added at first in hotfix.

Well, 6 months for a hot fix, it is more than hot ;)

But for a kernel crash they will likely release it faster. So let's wait a month.

Nucleoprotein commented 7 years ago

In this year AMD have much more driver releases, even 4 beta releases per month. In past year there was (mostly) one beta release per month + stable per 3 months.

jcdenton2k commented 7 years ago

Better than damn Nvidia; I've been stuck on v372.70 cause they can't be bothered to do basic stability testing for their drivers now.

Nezarn commented 7 years ago

New driver came out, no fix. :(

lightningterror commented 7 years ago

I guess we must wait 6 month to be implemented :/ But If I were to guess I'd say the next one should be the one.

avih commented 7 years ago

Maybe someone should ask them what they mean when they say "next one"?

gregory38 commented 7 years ago
Better than damn Nvidia; I've been stuck on v372.70 cause they can't be bothered to do basic stability testing for their drivers now.

Sure the BSOD of AMD is the definition of stability.

Maybe someone should ask them what they mean when they say "next one"?

IMHO, they don't even know when are released next driver (hence or following). Various branches, Q&A make it hard to predict. Honestly, I hope they will release at least this year.

avih commented 7 years ago

IMHO, they don't even know when are released next driver (hence or following). Various branches, Q&A make it hard to predict. Honestly, I hope they will release at least this year.

Maybe, and maybe not. We should at least ask.

Nucleoprotein commented 7 years ago

Sure the BSOD of AMD is the definition of stability.

NVIDIA drivers have issues too: https://www.techpowerup.com/227881/users-report-multiple-issues-with-geforce-375-86-whql-drivers And a little more older and more serious one (bricked GPUs): http://wccftech.com/nvidia-users-beware-latest-drivers-damage-pc/ So yep, I like more BSOD than bricked up GPU...

FlatOutPS2 commented 7 years ago

IMHO, they don't even know when are released next driver (hence or following). Various branches, Q&A make it hard to predict. Honestly, I hope they will release at least this year.

Well, I'd guess by next driver or the one after they mean a driver version in the next month or the month after. But if AMD is still typical AMD it'll be the month after the month after or the month after that. :p

gregory38 commented 7 years ago

@Nucleoprotein Nvidia issues don't make AMD driver more stable ;)

@FlatOutPS2 yes I agree with you that why I wrote couple of day ago

But for a kernel crash they will likely release it faster. So let's wait a month.