Closed kakra closed 5 years ago
Please apply the following patch to DXVK to get more descriptive error output: dxvk-error.patch.txt
I've never seen this problem or anything like it, and I test Witcher 3 a lot. With the current set of information I won't be able to do anything, though.
Thanks, I'll try during the next days. I never saw this behavior in the v0.80 series of DXVK.
I haven't noticed this with the "Beta 3.16" Proton version tho. Afaik that uses dxvk-0.90...
Atm 3.16-4 Beta, that I would guess is the release called proton-3.16beta-20181031
@SveSop I'm currently working with bleeding edge builds here... Proton rebased to 3.19 including some code to optimize the process scheduler priorities to reduce priority inversion effects, and bleeding edge dxvk from git built als winelib. This boosts SOTTR performance from 19 to 33 fps for me here (even 35 fps with latest wine-3.19). And it reduces stutter and fps dips in TW3 and PoE. Also, intermittent freezes in SOTTR are fixed. I'm also working on some avrt patches so that native xaudio can properly gain realtime priority (currently, only built-in xaudio does that, and only with staging patchset). I'm going to soon push these updates to my repository but I'm currently not satisfied with it, and I want to test quality a little more. Also, wine had some commits lately breaking compatibility with esync and d3d related patches from Proton which I need to iron out (I think I fixed most by now).
I don't think that the wine version has anything to do with it, or if it has, it's something that'll show up here as soon as Proton would be officially based off a newer wine version.
@doitsujin Is it possible that the patch you've attached just displays a bunch of newlines? I currently cannot reproduce it in Witcher 3 but it now occurs in SOTTR.
Ah yeah, sorry. This one should work: dxvk-error.patch.txt
Again, SOTTR works fine on my end.
SOTTR also radically dropped performance for me during one of my last rebases, from 30 fps to 10 fps (with vsync+triple buffer). But I don't know if this is due to code changes in wine-master or in DXVK. I'm currently trying to figure out if my wine-master rebase went wrong. There are currently many conflicting changes going on and I'm reintegrating patches from their updated sources now. I already reverted my own code changes as a first step but that didn't help. So there seems nothing wrong with those. Ah well... sigh
Can you just test things with a clean wine-tkg setup (if you're on arch) or something similar to rule out issues with your wine build?
@doitsujin Okay, something strange is going on. Out of desperation, I zapped the shader cache from $STEAMAPPS/shadercache/$GAMEID
(both DXVK and Nvidia) and the crash in SOTTR is gone, plus it's back to normal performance (the perceived performance even looks smoother now). The first benchmark run was clearly full of stutters as expected. Subsequent runs are fine now. Also, graphic distortions in SOTTR are gone (like Lara missing her clothes or hair).
Does this make sense to you? I wonder if TW3 benefits from a cache clear, too. Let me try...
PS: Don't try to reproduce Lara missing clothes and expecting some fun, the developers seem to have thought of this. :-)
That's weird and should probably not happen, but yeah, might be worth tryng for TW3 as well.
Is the cache depending on the DXVK version somehow? And are there safeguards against broken shader caches?
Or: s/shader cache/state cache/
Okay, I already found that there's a safeguard using sha1 sums of each state cache entry, and a version header. So how did it break for me?
Not sure. Did you manage to confirm whether it was DXVK's state cache or the Nvidia driver cache that was causing issues?
I nuked both and only then discovered that this wasn't the best idea to find which one actually caused the problem. :-(
Okay, I got TW3 to crash again, this time logging worked (that logging patch should be in mainline, shouldn't it?):
0029:err:clipboard:convert_selection Timed out waiting for SelectionNotify event
0029:err:clipboard:convert_selection Timed out waiting for SelectionNotify event
DxvkMemoryAllocator: Memory allocation failed
terminate called after throwing an instance of 'dxvk::DxvkError'
004d:fixme:seh:dwarf_get_ptr unsupported encoding 9b
004d:fixme:seh:dwarf_get_ptr unsupported encoding c4
004d:fixme:seh:dwarf_get_ptr unsupported encoding 7d
004d:fixme:seh:dwarf_get_ptr unsupported encoding 9b
004d:fixme:seh:dwarf_get_ptr unsupported encoding c4
004d:fixme:seh:dwarf_get_ptr unsupported encoding 7d
004d:fixme:seh:dwarf_get_ptr unsupported encoding 9b
004d:fixme:seh:dwarf_get_ptr unsupported encoding 4a
004d:fixme:seh:dwarf_get_ptr unsupported encoding a9
004d:fixme:seh:dwarf_get_ptr unsupported encoding 9b
004d:fixme:seh:dwarf_get_ptr unsupported encoding 4a
004d:fixme:seh:dwarf_get_ptr unsupported encoding a9
004d:fixme:seh:dwarf_get_ptr unsupported encoding 9b
004d:fixme:seh:dwarf_get_ptr unsupported encoding 4a
004d:fixme:seh:dwarf_get_ptr unsupported encoding a9
004d:fixme:seh:dwarf_get_ptr unsupported encoding 9b
004d:fixme:seh:dwarf_get_ptr unsupported encoding 4a
004d:fixme:seh:dwarf_get_ptr unsupported encoding a9
004d:err:seh:call_stack_handlers invalid frame 3986f519 (0x39672000-0x39870000)
004d:err:seh:NtRaiseException Exception frame is not in stack limits => unable to dispatch exception.
Looking at the code, it seems like I should somehow manage to reproduce this error even wtih DXVK logging turned on...
DxvkMemoryAllocator: Memory allocation failed
indicates that you're running out of memory (not necessarily VRAM).
Okay, I renamed the issue title to reflect the original problem. I think the "cache corruption" in SOTTR is really a different issue and should be reported separately by me if it occurs again.
It's strange that this can happen even very early after starting the game, read: When I just loaded a saved game the first time after starting The Witcher 3. I'll report back with new findings.
Actually, my system was loaded with some development applications which like to take a good amount of RAM while this issue occurred the last time. But it still had plenty of RAM left, around 8 GB. After all, TW3 is usually not THAT memory hungry (being an older game).
Here's an update:
err: DxvkMemoryAllocator: Memory allocation failed
Size: 134217728
Alignment: 256
Mem flags: 0x7
Mem types: 0x681
DxvkMemoryAllocator: Memory allocation failed
terminate called after throwing an instance of 'dxvk::DxvkError'
# free -m
total used free shared buff/cache available
Mem: 15931 8063 1415 184 6452 7092
Swap: 67583 1304 66279
# nvidia-smi after the crash
Fri Nov 9 21:03:35 2018
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 396.54.09 Driver Version: 396.54.09 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 GeForce GTX 105... Off | 00000000:01:00.0 On | N/A |
| 54% 43C P5 N/A / 75W | 1697MiB / 4006MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| 0 1040 G /usr/libexec/Xorg 1120MiB |
| 0 2537 G /usr/bin/kwin_x11 47MiB |
| 0 2547 G /usr/bin/krunner 1MiB |
| 0 2549 G /usr/bin/plasmashell 241MiB |
| 0 4762 G ...quest-channel-token=6856907186180940681 256MiB |
| 0 6145 G ...ra/.local/share/Steam/ubuntu12_32/steam 20MiB |
| 0 6153 G ./steamwebhelper 1MiB |
| 0 6172 G ./steamwebhelper 4MiB |
+-----------------------------------------------------------------------------+
I played this game for extended hours (sometimes 12 in a row, yes I'm an addict of this game) with previous versions of DXVK. So I wonder why I see this now.
Does it still work on older versions?
I don't see why allocating a 128MB buffer in system memory would suddenly fail when it previously didn't, especially since the memory allocator hasn't been touched in a long time.
After trying some games, I see that multiple games are affected... Skyrim SE freezes on loading screens or in the middle of the game, looking at the logs I also see DXVK complain that very moment about memory.
It looks like Chrome hogs a lot of GPU memory, Xorg was holding almost 3 GB of GPU memory. Restarting Chrome fixes that, and stopping Chrome gets rid of the issues in Skyrim SE. I don't think it has anything to do with the DXVK version but coincidence is that other processes occupy GPU memory. Shouldn't such memory swap out to system memory? Maybe something changed in the NVIDIA driver?
System memory that needs to be made visible to the GPU cannot be swapped out as far as I'm aware, and the Nvidia driver might have further limitations (probably for a good reason). fwiw I've seen similar issues on amdgpu under low-memory conditions, although not directly related to DXVK.
In any case, if you consider this issue resolved by closing third-party applications, please close the issue.
There's definitely a bug somewhere leaking memory... If I play long enough, VRAM eventually fills up to 3.7 of 4 GB and then games either freeze, crash or behave strangely (like flickering or missing textures/models). Something changed, I'm just not sure what. I'm using the same NV driver version since some time now, so it's not too likely that the graphics driver changed something. I followed DXVK master closely. Maybe something new in DXVK triggers such a bug?
It's very likely possible that the bug was there earlier but something triggers is much earlier now. I've seen similar problems on very rare occasions before but only after very long gaming sessions.
You can monitor DXVK's memory consumption (both VRAM and mapped system RAM) with DXVK_HUD=memory
. I haven't seen any behaviour that would indicate a leak.
I've also notice (DXVK_HUD=memory) that on some games like Dark Souls 3 my old lady GTX 960 with 2 GB vram is unplayable. Memory consumption is about 1.8-2.0 GB (just like in windows), but just try to sit at the bonfire, death or teleport to another area and oh boy it spikes to ~2.6-3.4 GB. And now I'm playing Dark Souls 3 with bullet time (slow motion). Fallout 4, Skyrim SE, ReCore DE also eats way above my vram limit. But games like Dark Souls Remastered, Deus Ex HR, Divinity OS 2 DE, GTA V, Hard Reset Redux, Shadow Warrior 2, Witcher 3 works perfect.
I have same issue with 4GB GPU mem and SOTTR. But I can work around it using medium textures. It's looks like game do not free memory immediately but using some garbage collector/reuse mechanism (or just leaks and it's not so noticeable on windows). On windows some GPU defragmenter is working and also keeps hottest memory resident on GPU while in dxvk if texture allocated on the host memory than it keeps on host forever.
I changed the title because it is not game specific. It's visible in different games. I'm playing SOTTR even with low textures and it happens after 3-4 hours (sometimes earlier). In SOTTR another effect probably resulting from this is the second benchmark always runs slower than the first, going into a game level, then returning to the benchmark, it's even slower now. So there's overhead accumulating somewhere.
SOTTR: Results in graphical glitches first, then freezes, finally crashes to desktop after some time of thrashing the harddisk TW3: Just freezes, often just after the initial saved game load, on rare occasions it crashes to desktop SkyrimSE: Just freezes, either during a loading screen (endless loading screen) or midst in the game, sound continues to play
Anyone of you guys know what changed lately in your systems? This effect was much less visible some time ago (or even didn't exist, hard to say).
Would it help to use dxvk.allowMemoryOvercommit = True
?
You could try, but that only has an effect when you actually run out of VRAM. With that option disabled, DXVK falls back to a system memory allocation.
But you won't ever run out of VRAM in Witcher 3, for example.
This needs further testing but it seems to help: SOTTR runs a lot slower if Xorg already occupies a lot of VRAM but it didn't crash in a quick test. Also the other games affected seem to no longer crash (and, in contrast, see no slowdown). But I need to load the system a little more.
@doitsujin Okay, after testing for a while, available VRAM makes a huge difference for performance. All games crash if I do not enable overcommitting. Crashing is a bad experience, so we probably need something more intelligent here. Meanwhile, I patched my version of DXVK to enable overcommitting by default which gets rid of all the crashes I experienced lately. I rather reach a savepoint with bad performance than have a crash which forces me to repeat parts of the game. There's always an opt-in to gracefully quit the game and clean up VRAM somehow.
Games are affected in different ways, I've tested two so far:
So first: Maybe this issue should be tagged "performance".
Second: Is there anything intelligent that DXVK could do about memory management? In retrospect to the issue reporting degrading performance with high quality textures in SOTTR, is it possible for DXVK to somehow prioritize what goes to VRAM and what goes to system RAM? Is it possible that DXVK could discard allocations from VRAM, or swap them between system memory and VRAM based on usage patterns?
I wonder how Windows manages this... It either has ways to manage and swap VRAM with system RAM, or the games behave different there and can actually manage this on their own. Then the question is, why can't they do it when running under wine/DXVK?
I must be doing something wrong i guess. Replaced Proton 3.16 dxvk files with 1724d51 and played about 2 hours. Never went past 2.8GB allocated mem.
GTX970 4GB w/396.54.9 driver. 1080p all on "Ultra" in TW3. No huge "dips" i guess, but fps is not stellar (around 50'ish fps) Perhaps you use 4K? Should i opt to try to use that "HD Reworked Project"?
EDIT: Oh, i realized, you do perhaps load vram near 4GB to see if stuff starts lagging too much? So, just burn some vram to see?
@SveSop Yes, I burned some GPU VRAM by opening a lot of Chrome tabs, Spotify and some other Chrome-based apps. Even Steam itself is Chrome-based (at least the webview component). I don't know why Chrome-processes eat so much VRAM but it can become an issue.
You could try the HD Reworked Project to see if it reduces fps dips for you. It felt smoother here.
BTW: After unpacking the mod and extracting the data files and the config folder, go into the game settings and activate the new texture quality level to actually use the configuration.
When DXVK fails to allocate a resource in VRAM, it allocates it in RAM which of course comes with a pretty big performance penalty. That shouldn't crash though. If anything it should crash only if you allow overcommiting.
Windows does the same but it is a bit smarter and also moves existing resources out of VRAM to make room for more important ones.
@K0bin I think it's quite the other way: Overcommitting allows memory to be allocated from system RAM even if the game didn't ask for it, otherwise it fails which crashes the game (because DXVK refuses to continue).
I wonder if it is possible to make DXVK similar smart and let it move resources out of the way. But I guess this has to be fixed lower down the graphic stack layers, i.e. the driver itself or Xorg must be willing to give up resources when requirements are coming in...
@K0bin I think it's quite the other way: Overcommitting allows memory to be allocated from system RAM even if the game didn't ask for it, otherwise it fails which crashes the game (because DXVK refuses to continue).
No, its not the other way, see #527.
I wonder if it is possible to make DXVK similar smart and let it move resources out of the way. But I guess this has to be fixed lower down the graphic stack layers, i.e. the driver itself or Xorg must be willing to give up resources when requirements are coming in...
It's possible but it's very hard and a lot of work. This has to be done inside DXVK though, Vulkan explicitly leaves memory management to the application. That's probably not going to happen any time soon, so your best option is to just lower your graphics settings.
@K0bin Interesting... DXVK definitely terminates here when the error occurs. Revisiting the allocation code, it shouldn't do that. It should just return DxvkDeviceMemory()
instead of result
. So I conclude something is going wrong just a little bit later? Like accessing null pointers? Ah no, it throws from here: https://github.com/doitsujin/dxvk/blob/4db5c21ec5b983334431e9e8f21b9cbaa2ac7d2a/src/dxvk/dxvk_memory.cpp#L197
It only throws the error when neither the VRAM nor the System RAM allocations succeed. Once that happens, it's too late to continue in any meaningful way anyway.
I think its not a good test for me with GTX970 to use over 3.5GB vram for comparing performance, cos of the 970 memory configuration thingy. https://hexus.net/tech/news/graphics/79925-nvidia-explains-geforce-gtx-970s-memory-problems/
So this "bug" is mostly crashes due to OTHER apps using up gpu memory, and dxvk not pushing this (useless) memory usage out of the way to up performance? :)
@doitsujin Yes this is what the code says but I'm sure there's still sysmem available, or the system could just swap stuff out to disk to make some small allocation of 128M available. Could this be a driver bug? After all you're not allocating through standard C/C++ functions but through vulkan functions.
i don't know what the issue is, but it seems that something eats unusual amounts of memory on your end. Have you tried running those games on a simple WM (like fluxbox) without any applications running in the background?
Stupid question from my end: Is the problem here that there is a memory leak eating more and more vram until the game crashes in eg. TW3? If so, i dont really see the same problem on my end, as those 2.8GB allocated vram shown happened in the first 2-3 minutes of playing TW3 yesterday, and did not increase over the course of 2 hours of me playing, loading/saving games several times without change. In use (commited? dont remember the wording) hovered around 2.2GB - 2.5GB mostly.
I did not try to crash the game on purpose by overloading vram in some other manner tho.
Just trying to troubleshoot on a different system than yours to weed out any possible non-dxvk issues.
Did a wee bit of testing back and forth, and can't really say i am able to make something eat so much vram.. Opening 10 chrome windows did not chunk out a huge deal of vram either tbh, but for all i know you could be running 200+ windows while editing a 4K movie in the background :)
What i DID notice however (no difference between Proton 3.16 w/dxvk 0.90 vs. building my own dxvk from git) was the "Memory Allocated" from the DXVK hud only increased and never went down even if i loaded a saved game with less "Memory used".
That might be intended, and tbh SHOULD not be an issue as long as it is <4GB i guess? Eg.
nVidia-SMI: Witcher3: 1488
nVidia-SMI: (Total): 1880
DXVK Memory Allocated: 2030
DXVK Memory Used: 1837
Loading save games from different spots + running around and so on would up the "Memory Allocated" upwards, even tho "memory used" goes up/down as needed. Not really sure what the discrepency between nVidia-SMI (who i would deem to be "accurate" in usage directly from the driver) and "Memory used"? nVidia-SMI "Total" memory was 1880, and somewhat more in line with "DXVK Memory" i guess, but nVidia one includes Xorg, gnome-shell and stuff like that, so i would not think DXVK would be able to "read" that?
I did not test hours upon hours of gameplay, but from the 2 hours i played yesterday mentioned above, i had 2.8GB "Memory allocated", so i guess it MIGHT be something that just grows and grows until it gets a problem? Is the mem allocation something that SOMETIMES gets cleared out? (Or rather SHOULD).
DXVK uses a chunk allocator, thus it usually doesn't cleanup because some bit of information will always be left in a chunk. Chunks are allocated probably in 64 MB blocks, within each chunk you'll have a free list of blocks from which DXVK will allocate into the biggest block available (except a free block matches exactly in size), if the allocation request type matches the chunk type. It's similar to how btrfs manages its device space. If a chunk becomes completely free, it could be de-allocated, but that really doesn't make much sense because probably you would request a new chunk of memory just moments later. If no free block can be found, a new chunk will be allocated from the device. Thus, it's normal that the memory usage only increases until it peaks at some value. A chunking allocator is pretty much the best thing you can do if you need to handle different and incompatible types of allocations. You just need to properly tune the chunk size so you can fit all types of allocations without too much overhead and without too much wasted space. The "allocated" counter is probably what's been allocated as chunks, the "used" counter is what's actually used across all chunks. The difference is wasted space which wasn't used or couldn't be used due to incompatible memory type flags.
As far as I understood, chunks are allocated from the driver or the vulkan layer which in turn decides if it allocates from the device or from system memory (depending on the flags given). Within each chunk, memory is managed by DXVK itself by keeping lists of free blocks (pairs of offset/size).
What happens in my case seems to be: DXVK asks vulkan for a new chunk of device-local memory, vulkan says "no", DXVK tries again without the "device-local" flag, thus it allows to use non-local memory which is slower because it is accessed over the PCI bus. But the vulkan says "no" again. But there's plenty of system RAM available to allocate such a chunk. I can only guess why that is. Maybe vulkan cannot find system memory that would be mappable by the GPU. Not all of your physical address space may be available to the GPU because of chipset limitations, or because other devices already mapped that, i.e. another GPU, or I don't know what.
Overcommitting "solves" this because it lets vulkan pretend that unused chunk memory isn't going to be used any time soon. Thus, such memory is still available to other allocations. Your Linux kernel does a similar thing: Allocated memory only becomes mapped to real memory if something writes to the memory blocks. Otherwise it stays idle. It accounts for the allocated RAM but not the used RAM. It's the "virt" counter you'd see in top: virt is allocated space. But things start crashing if one application now actually wants to use its allocated but yet unused memory: The GPU won't find any space to put that request, it fails, crash. Linux solves this by swapping to disk. The GPU could request the driver to swap to sysmem. But as I understood, vulkan leaves that completely to the application. So DXVK would be in charge of doing so. But DXVK doesn't implement this. It's complicated. It should be avoided as long as you can.
So in turn that means: Overcommitting does not crash for me, thus a lot of VRAM is only allocated but not used. So Chrome (or Xorg) seems to allocate a lot of VRAM just because it can but it never uses it.
To the experts: Does this make sense?
I'm running with two monitors, left one is a full-HD TV (which I actually use for gaming from the couch, with a wireless controller), and the right one is a 4k PC monitor. I do no video editing but some browser tabs may host paused or finished youtube videos (which tend to be streamed in 4k quality). I also have multiple gmail tabs open. At least back in 2014 there was a bug in Chrome where it would slowly eat away your VRAM if you have gmail opened over longer periods of time. But that was fixed since then.
So overall I'm probably running a virtual framebuffer of (1920+3840)x2160 pixels at 32 bit color depth (I think it doesn't use 24 bit buffer representation, but color space is 24 bit). With triple buffering, that's about 142 MB of screen buffer. Probably there's some padding and alignment but nothing to worry about...
Or a little less technical and abstract:
Think of your desktop (the real wooden one where you put your keyboard and mouse on) as your VRAM. Everytime you want to do something with the GPU arrange a peace of coloured paper onto your desktop. Put your information in the paper sheets. Different types of information will use different coloured paper. At some point either your desktop fills up and you can only use the space left on paper, or the space left on paper is enough to work with. If your desktop space fills up, you could start putting paper sheets elsewhere... On the floor... or into some folders. But accessing these is much slower then. Overcommitting is like using scissors to cut parts of paper off and replace those parts with a different color. But if the other application now has to put information there and there's no space left to put the cut-off snippets, things will crash.
Thanks for a thorough explanation :)
I use 2x1080p monitors, but rarely have i ever seen vram used past 2GB in the cruddy old games i play... save for TW3 (probably old aswell), and have after a while of playing up toward 2.8GB allocated mem. Now.. i dont do 12+ hour gaming sessions without logging off, nor do i have many many chrome tabs open while i game. I DO however sometimes watch a video of some quest, or read some shit WHEN i play, but nowhere near going oom of vram. This COULD ofc be worse if i play for a lot longer, as i said (and to your explanation) the game COULD be allocating chunks until vram is all spent? Dunno.
How long does it take you if you do a clean boot and just load up steam and start TW3 until you get errors? Cos troubleshooting stuff that is in the realm of "Oh.. yeah, you need to do a 12 hour playingsession before that happens" is kinda.. uhm.. Well :)
As i said, chrome seemed hard pressed to really use much vram for me, so i am looking for something different perhaps.. some example code that can be started over until vram is spent perhaps? Found some references to GLSLHacker (GeeXLab) and some 4GB vram test thingy, but was not able to find that anymore. Opening a 4K video on youtube seems to be using a whopping 70MB of vram for me, so i dunno...
@kakra
The GPU could request the driver to swap to sysmem. But as I understood, vulkan leaves that completely to the application. So DXVK would be in charge of doing so.
Actually no, it isn't. Once a memory chunk is allocated, Vulkan apps don't really have to bother with it, residency is magaged by the driver. Even for device-local memory types, there is no guarantee that memory allocated from them is actually located in VRAM, it can be paged out if necessary.
@doitsujin So we are back to "that doesn't seem to happen here". It only strengthens the theory this is a driver / graphic stack issue here... Maybe related to configuration or hardware memory layout...
Okay, I managed to let SOTTR allocate more than 4GB of memory now without a crash, also TW3 allocated around 3GB without a crash now - with Chrome and some other windows opened. I have a theory of what was going wrong in my system but I need to test this a little more. The "slow down over time" issue in SOTTR also seems to be gone but since my system does a lot of background activity currently, I'd like to defer the performance testing a little more. Currently, stuttering is a lot more apparent now and I'm not sure if it comes from background activity or switched settings. But the games seem to cope well now with Xorg/Chrome allocating a lot of memory, overall memory footprint of those seems a little bit lower now. I'll report back.
Fun fact: Sometimes it helps to write elaborated texts explaining things to get the clue where a problem is. :-)
Apparently, during testing various kernel configurations I managed to crash my filesystem the hard way. I probably lost some important changes to the wine code, one of which is hard to recreate. Replacement drives are ordered because I want to keep around the broken file system for trying recovery. This throws me back about 1-2 weeks, so I'm going to pause working on this for a few days.
But to recap what I found out so far: Vulkan (or NVIDIA) seems to interact with THP very badly (at least in combination with wine). It wasn't able to allocate more RAM because there was just no mappable memory block left to allocate. This is probably a memory fragmentation issue. Usually, the kernel would defer huge page creation then. But I also noticed that my kernel didn't properly enable IOMMU (which seems to be important for NVIDIA). I was still testing that part when the crash occured.
Since THP can be a pretty nice, performance enhancing feature, I wanted to work out a proper configuration and document that. First tests showed that it makes a difference in performance. Overall fps was mostly identical but I did notice audio-dropouts every now and then which I didn't notice before.
So I probably take the chance to rebase my work to wine 3.21 then. I was just finished with preparing and cleaning up the 3.20 release when everything went down the virtual drain. :-(
I think I'm back up running by next weekend. Thank you, Murphy, that I discovered my daily backup wasn't working that very same day.
Note to myself: Don't mix zswap with some workloads. Push often even if still WIP.
Conclusion: If someone is seeing this issue, too, it may be due to THP being enabled and not being fully and/or correctly configured. Could you check? grep ^ /sys/kernel/mm/transparent_hugepage/*
Software information
The Witcher 3, all settings maxed out, full HD, Nvidia Hairworks all characters + AA4
System information
Log files
After loading a saved game, the game freezes just milliseconds after starting to fade in the screen. Since the game still fades in, everything is dark but it looks like everything is rendered correctly - no models or textures missing, NV hairs are also working. This happens only sometimes.
Looking at the logs I see
Turning on full debug logging of dxvk eliminates the issue, it's not longer reproducible.
The frozen game can be successfully and instantly killed with SIGKILL.