RPCS3 / rpcs3

PS3 emulator/debugger
https://rpcs3.net/
GNU General Public License v2.0
14.55k stars 1.85k forks source link

Vulkan: Nvidia specific crash @368.69 #1922

Closed raven02 closed 7 years ago

raven02 commented 7 years ago

It simply crash at bootup for every titles .Having bisected , and found regression at https://github.com/RPCS3/rpcs3/pull/1800

@kd-11 Looks like that pull works okay on AMD but not on NV system.

raven02 commented 7 years ago

@SakataGintokiYT , is this build Vulkan work for you on your NV card?

https://ci.appveyor.com/api/buildjobs/uc4rxauj3xquomc7/artifacts/rpcs3-ReleaseLLVM-1e627104.zip

SakataGintokiYT commented 7 years ago

@raven02
Yes it works for me :P

kd-11 commented 7 years ago

@raven02 Unless nvidia vram configuration has changed, that pr would have been crashing nvidia for quite some time now. Is this the case?

raven02 commented 7 years ago

So far i have hit too different type of crash on NV system

  1. Bootup crash (happened for every games because of PR #1800)
  2. In-game crash suddenly (still happening)
kd-11 commented 7 years ago

When I initially created that pull request, it was tested on nvidia and there was no bootup crash. When did the bootup crash actually start to happen? You confirmed no crash in the original PR for that changeset https://github.com/RPCS3/rpcs3/pull/1788

raven02 commented 7 years ago

Probably i upgraded the Nvidia driver or (Windows 10 rollout patch)

kd-11 commented 7 years ago

In case the vram config has been changed on nvidia, I'll need to get more info about the heap layout. I'll create a debugging branch to dump heap tables to log so we can see what happens then. What is the crash message? Or is it a windows app crash?

raven02 commented 7 years ago

It is a windows app crash

untitled

kd-11 commented 7 years ago

Ok. Could be an issue with NV driver then. Anyway, i'll notify you when the debugging patch is ready.

raven02 commented 7 years ago

Thanks . @kd-11

raven02 commented 7 years ago

It crashes at this part on my NV system .

Besides , looks like VSync on Vulkan is working now with new NV driver.

        if (is_device_local)
        {
            if (device_local_vram_size < heap.size)
            {
                result.device_local = i;
                device_local_vram_size = heap.size;
            }
        }
kd-11 commented 7 years ago

That makes it even more likely that the nv driver is misbehaving. probably has a heap entry that is out of bounds.

raven02 commented 7 years ago

I see. 2nd type of crash mentioned above can be alwasy reproduce from 1942 Joint Strike as it always crash when see the first graphic

kd-11 commented 7 years ago

Well, the driver really shouldn't crash to desktop, i'd have expected a failure returning an error code instead. Anyway, try this build and post the log https://ci.appveyor.com/project/kd-11/rpcs3/build/1.0.206. You dont have to even run the game, just double click and it should dump to log.

raven02 commented 7 years ago

@kd-11 Done . here is the full log

RPCS3.zip

·E RSX: BEGINNING HEAP DUMP: HEAP_COUNT = 2, TYPE_COUNT = 11
·E RSX: HEAP INDEX 0, flags = 1, size= 536870912ll
·E RSX: HEAP INDEX 1, flags = 0, size= 4229955584ll
·E RSX: MEMORY TYPE 0, flags = 0, heap= 1
·E RSX: MEMORY TYPE 1, flags = 0, heap= 1
·E RSX: MEMORY TYPE 2, flags = 0, heap= 1
·E RSX: MEMORY TYPE 3, flags = 0, heap= 1
·E RSX: MEMORY TYPE 4, flags = 0, heap= 1
·E RSX: MEMORY TYPE 5, flags = 0, heap= 1
·E RSX: MEMORY TYPE 6, flags = 0, heap= 1
·E RSX: MEMORY TYPE 7, flags = 1, heap= 0
·E RSX: MEMORY TYPE 8, flags = 1, heap= 0
·E RSX: MEMORY TYPE 9, flags = 6, heap= 1
·E RSX: MEMORY TYPE 10, flags = 14, heap= 1
·E RSX: SELECTED: Device Local = 7, Host_Visible = 10
kd-11 commented 7 years ago

Well, it clearly did not crash in that section since it reaches the end of the block. The algorithm has done its part already and the crash did not happen. To really find the location of the crash, you need to compile VKGSRender with disabled code optimization and attach the debugger. The crash is probably happening elsewhere and the debugger is confused by the optimized code.

raven02 commented 7 years ago

The debug build attached is indeed crashed once i run Arkedo 01

kd-11 commented 7 years ago

is there anything in the log about that?

kd-11 commented 7 years ago

There's only 2 heaps on NV, there's no way the selection code is faulty (there aren't any options to mess up since its either heap 0 or 1).

raven02 commented 7 years ago

I run gcm sample cube.elf in VS2015 debugger and breaks out at

    CHECK_RESULT(m_swap_chain->queuePresentKHR(m_swap_chain->get_present_queue(), &present));
kd-11 commented 7 years ago

And if you revert https://github.com/RPCS3/rpcs3/pull/1800 it works? Not just running the build at that point, i mean using a git revert to undo that single commit while keeping all the other code changes since then.

If so, modify the code in the memory types code to just return a result with device_local index 7 and host_visible index 9 and see if it works.

raven02 commented 7 years ago

Yes , i revert only that commit #1800 using the latest build code from master.

Let me modify the code as advised

kd-11 commented 7 years ago

What are the results of the old algorithm? I'm guessing 8 and 10?

raven02 commented 7 years ago

It didn't work

Modify code as

    result.device_local = 7;
    result.host_visible_coherent = 10;
    return result;
raven02 commented 7 years ago

If i set it to 8 and 10 , it works.

kd-11 commented 7 years ago

There is no difference between types 7 and 8. They refer to the same heap. This implies a bug in the driver.

raven02 commented 7 years ago

Yes , i think so .How we can workaround it at this moment?

kd-11 commented 7 years ago

A workaround is easy, but we first need to confirm the issue to report to nv. If this is the recent doom driver you are using they may have tried to add hacks in there. I'll upload a sample that you can run to confirm the issue (based off of the lunarg cube demo). Also, first make sure you are running the latest nv driver before we go off making accusations. They update beta drivers often.

raven02 commented 7 years ago

Sure .

kd-11 commented 7 years ago

Can you confirm the attached cube demo works without crashing? cube.zip

raven02 commented 7 years ago

I run it .Pop up a while then close itself. Is it normal?

kd-11 commented 7 years ago

sorry. bad packaging. let me resend the correct zip.

kd-11 commented 7 years ago

cube.zip Proper executable and required files (shaders + texture). I've modified it to use the same memory mechanism we use.

kd-11 commented 7 years ago

overwrite the previous exe

raven02 commented 7 years ago

It crashed.

kd-11 commented 7 years ago

Good. That is a clear and reproducible example. That driver is broken. I'm going to open a ticket on nv forums. For now, I'd suggest downgrading your driver to the last working version since this is a rather serious bug. It seems the memory type list is bogus since it fails to run a vulkan demo from lunarg when using staging textures. Adding a fix for this will be really messy and is unnecessary. We should not work around someone else's broken code. If I may ask, what driver version are you running? I'll need it for the report.

raven02 commented 7 years ago

i see. It is their latest driver 368.81

I'll rollback old driver to test out shortly.

kd-11 commented 7 years ago

So the one before that works, right? 368.69 I think.

raven02 commented 7 years ago

Probably older .I have to check it out and will update you shortly.

kd-11 commented 7 years ago

Graphics card model?

raven02 commented 7 years ago

GT 650M 512M (iMac) on Windows 10.0.14390

kd-11 commented 7 years ago

Any other models affected? Pascal or maxwell? A 900 series perhaps? CC @SakataGintokiYT

raven02 commented 7 years ago

OK. Here is the list and start regressed at 368.69

368.22 OK 368.39 OK 368.69 crash 368.81 crash

vlj commented 7 years ago

Memory type is changing with driver revision. For instance on 368.81.0.0 : http://vulkan.gpuinfo.org/displayreport.php?id=547#memory and 368.22.0.0: on http://vulkan.gpuinfo.org/displayreport.php?id=430#memory

One has to parse memory type and flag to assign device local and main memory index type.

kd-11 commented 7 years ago

@vlj Its a problem with the parsed memory types. The test case I'm using is not hardcoded otherwise it wouldnt work on amd which has only like 3 memory types. To see this error, get the lunarg cube sample and force the texture function to use a staging texture with the current driver. It will attempt to get a device local heap (which it does successfully as index 7 in the case presented here) and quickly crash.

raven02 commented 7 years ago

Fall back to older version like 368.39 , Vulkan backend works again however seems to be more likely to crash in-game. Probably something fixed from 368.39 to 368.81

kd-11 commented 7 years ago

Can anyone else confirm that https://github.com/RPCS3/rpcs3/files/367400/cube.zip crashes on nvidia?

raven02 commented 7 years ago

@kd-11 , i'll test it it out shortly

raven02 commented 7 years ago

It crashed on my 368.81 GT650M

kd-11 commented 7 years ago

Well, we know the 650M has a problem. Anyone on maxwell? On devtalk I was informed that it would work on maxwell and I need to confirm.