doitsujin / dxvk

Vulkan-based implementation of D3D8, 9, 10 and 11 for Linux / Wine
zlib License
13.4k stars 866 forks source link

[d3d11][regresion] Genshin Impact broken lightning with defrag #4395

Open isbugged opened 1 month ago

isbugged commented 1 month ago

With current master dxvk lightning breaks in certain spaces, specially indoors defrag

Disabling defrag with DXVK_CONFIG="dxvk.enableMemoryDefrag = false" fixes it nodefrag

Software information

Genshin Impact 5.1 Max quality settings

System information

doitsujin commented 1 month ago

Is there a way to actually like debug this? I can't really go off a screenshot.

Blisto91 commented 1 month ago

Try to make a apitrace of a affected area https://github.com/doitsujin/dxvk/wiki/Using-Apitrace

doitsujin commented 1 month ago

Gonna need detailed instructions on how to reproduce this in game. Been running around in some early areas including some buildings in the first city and it looks fine with defrag on and off, but please keep in mind that I don't really play this game, I don't have access to everything.

isbugged commented 1 month ago

Reproducing this seems inconsistent, the moment i restart the game to do the trace it solves itself. I managed to make apitrace work with the game, if i find a consistent trigger i'll upload a trace

doitsujin commented 1 month ago

This is why I'm trying to reproduce this locally, I can play around with the code to make defrag more aggressive but for that I need to know an area where the problem reproduces somewhat consistently. So far I haven't been able to find one.

doitsujin commented 1 month ago

Here's a branch which essentially moves everything around all the time, should trigger the problem more reliably: https://github.com/doitsujin/dxvk/tree/frog

Still looks fine here though.

isbugged commented 1 month ago

Got a trace with the frog branch, i had to run around to trigger it, it happens in the last seconds of the trace when i enter the tabern.

https://share.mailbox.org/ajax/share/058a246a007fa64b51ac93807fa64f8eb4af6c9b7c206432/1/8/MzQ/MzQvMg

That place should be accesible from the start of the game, but i only managed to trigger it after jumping around, reproducing this will require some teleport points unlocked probably.

doitsujin commented 1 month ago

Does this look correct? I tried this a whole bunch now but just can't seem to get it to break with either the trace or in game itslef. I did some teleporting around in the starting area as you described. Bildschirmfoto-703

isbugged commented 1 month ago

That seems right, when it triggers it looks like the first screenshot, with no lightning anywhere in the scene. It took me a lot of restarts to make it happen, i dont think its very easy to trigger. Maybe the defrag rework has nothing to do with it and its a coincidence that it triggered just now. Ill keep testing with master or that image invalidation branch for some days and report back.

isbugged commented 4 weeks ago

I've tested this weekend a fair bit with master and I couldn't reproduce it, the frog branch seemed more consistent. I've rebased it locally to get the latest changes and retested, it triggered instantly.

The game seems to have 2 lightning systems, the main time of day sun/moon cycle and static lightning for cities/indoors/caves that can change at 06:00 or 18:00 ingame, in most scenes its only on at night. When the bug triggers it only affects the static lightning, but it can trigger in different ways: weirdcircles2 For reference this are the graphics options i had at the time: options The circles correspond more or less to the light sources/probes for global illumination. I tested changing the option in that state and it turned all the lightning off: globalilluminationoff And turning off volumetric fog for some reason made this: wtf2 This where the only options that seemed to have an effect.

Once the lightning is turned off changing options it doesnt trigger it back on anywhere without restarting. Teleporting, waiting for ingame time change of lightning trigger, logout and login again doesnt matter.

I also had this happen in another run trying to trace it: trace2

I tried to reproduce it replaying the traces, but i only managed to trigger it once compared to the game, and not the same effect (also apitrace breaks if i pass the loop option, which makes it difficult to stress test). The best way to trigger it seems to be entering and exiting the tavern (I used the entrance on the top floor) and changing global illumination options after some time.

doitsujin commented 4 weeks ago

Do you know if ANV exposes a transfer queue on your GPU? This should be apparent from the DXVK logs.

I still haven't managed to reproduce this at all and I'm starting to suspect driver memes.

isbugged commented 4 weeks ago

I've attached the logs plus vulkaninfo output, but the relevant parts seems this:

info:  Queue families:
info:    Graphics : 0
info:    Transfer : 0
info:    Sparse   : 0
VkQueueFamilyProperties:
========================
    queueProperties[0]:
    -------------------
        minImageTransferGranularity = (1,1,1)
        queueCount                  = 1
        queueFlags                  = QUEUE_GRAPHICS_BIT | QUEUE_COMPUTE_BIT | QUEUE_TRANSFER_BIT | QUEUE_SPARSE_BINDING_BIT
        timestampValidBits          = 64
        present support             = true
        VkQueueFamilyGlobalPriorityPropertiesKHR:
        -----------------------------------------
            priorityCount  = 4
            priorities: count = 4
                QUEUE_GLOBAL_PRIORITY_LOW_KHR
                QUEUE_GLOBAL_PRIORITY_MEDIUM_KHR
                QUEUE_GLOBAL_PRIORITY_HIGH_KHR
                QUEUE_GLOBAL_PRIORITY_REALTIME_KHR

There is an option (ANV_QUEUE_OVERRIDE) to force queue support in the driver but enabling anything outside default settings breaks everything (more so since they enabled sparse binding).

GenshinImpact_d3d11.log GenshinImpact_dxgi.log vulkaninfo.txt

doitsujin commented 4 weeks ago

That's a firm no and rules out one of the theories I had.

That said, I have another suspicion. The game copies an R32G32B32A32_UINT image to a BC6H_UFLOAT one, which is legal in both D3D11 and Vulkan, however there is an edge case that is fun for the whole family, specifically copying one RGBA32 texel to a 1x1 or 2x2 mip of the destination image. If ANV fails there, we'll read stale data and artifacts like the ones in your last couple of screenshots happen.

I updated the frog branch with some code that replaces those image->image copies with an image->buffer->image round-trip, would appreciate if you could test with that.

isbugged commented 4 weeks ago

I'm afraid it still triggers intelworkaround1 intelworkaround2 intelworkaround3

I've also done a quick test with mesa stable as of current Arch Linux packages: intelworkaroundstable1 intelworkaroundstable2

Same steps of triggering, disabling global illumination and volumetric fog.

Now if this is clearly a bug in ANV ill report it and see where it goes. Its hard to trigger it on master with the current defrag code, only this stress test branch triggers it consistently, so being more cautious with ANV reports might be enough for the moment until it gets looked at.

doitsujin commented 4 weeks ago

Yeah I'm pretty much out of ideas then. I don't have solid proof for this being an ANV bug either, but so far no one seems to have been able to reproduce this on all sorts of other hardware (and drivers; I've thrown AMDVLK at it for good measure and even that works just fine here).

Blisto91 commented 2 weeks ago

@isbugged Could you per chance try to make a ANV mesa issue about this? Would be great to get some driver dev input on the issue if it reproduces only on Intel.

Edit: nevermind i see you already did https://gitlab.freedesktop.org/mesa/mesa/-/issues/12084