HansKristian-Work / vkd3d-proton

Fork of VKD3D. Development branches for Proton's Direct3D 12 implementation.
GNU Lesser General Public License v2.1
1.94k stars 202 forks source link

Elden Ring (1245620) crashes on Intel Arc when opening menus since commit 423b939 #2016

Open Vinjul1704 opened 5 months ago

Vinjul1704 commented 5 months ago

Elden Ring crashes on my Intel Arc A380 when trying to open various menus since this commit: https://github.com/HansKristian-Work/vkd3d-proton/commit/423b9390ace4324392e56226f6541d2a0867bdb3

Short video demonstration showing the behaviour of the last good and the first bad commit: https://youtu.be/jTEFZqD5BL0

While the game is technically playable on the last good commit, performance is not great and very inconsistent to the point where it feels like it switches between 20 and 40 FPS constantly. Recent versions perform much better and feel good in-game, however not being able to use most of the menus is obviously not playable.

This is on a system with rebar enabled, but limited to 4GB via the i915.lmem_bar_size=4096 kernel argument. This is to work around another non-game-specific rebar issue on Intel Arc. Removing the limit or disabling rebar + above 4g entirely makes the game and system freeze in the loading screen or shortly after loading in already, even with the "last good" commit, but that's unrelated to the issue here.

If needed, I can try to capture traces or re-test it on different systems and operating systems.

Software information

Elden Ring, Steam ID: 1245620, low/custom (textures + shadows 1 tick above high) settings, different resolutions, aspect ratios and window modes

System information

Log files

Attached are VKD3D logs of both the first bad and last good commit, as well as a Proton log of the first bad commit (last good is too large to be uploaded directly).

steam-1245620-firstbad.log vkd3d-proton-er-firstbad.log vkd3d-proton-er-lastgood.log

HansKristian-Work commented 5 months ago

Can you upload a log with VKD3D_CONFIG=log_memory_budget and vulkaninfo output?

Vinjul1704 commented 5 months ago

Here you go:

vkd3d-proton-er-firstbad-memorybudget.log vulkaninfo.txt

I saw there've been commits related to that config option recently. Do you need a log of a recent build too, or is that one of the first bad commit enough?

HansKristian-Work commented 5 months ago

That log doesn't look like I expect. Please try with master. It should have more useful logging.

Vinjul1704 commented 5 months ago

vkd3d-proton-er-master-memorybudget.log

HansKristian-Work commented 5 months ago

The problem seems to be

VkPhysicalDeviceMemoryProperties:
=================================
memoryHeaps: count = 3
        memoryHeaps[0]:
                size   = 2088763392 (0x7c800000) (1.95 GiB)
                budget = 614465536 (0x24a00000) (586.00 MiB)
                usage  = 0 (0x00000000) (0.00 B)
                flags: count = 1
                        MEMORY_HEAP_DEVICE_LOCAL_BIT
        memoryHeaps[1]:
                size   = 16814209024 (0x3ea347800) (15.66 GiB)
                budget = 15132000256 (0x385f00000) (14.09 GiB)
                usage  = 0 (0x00000000) (0.00 B)
                flags:
                        None
        memoryHeaps[2]:
                size   = 4294967296 (0x100000000) (4.00 GiB)
                budget = 2600468480 (0x9b000000) (2.42 GiB)
                usage  = 0 (0x00000000) (0.00 B)
                flags: count = 1
                        MEMORY_HEAP_DEVICE_LOCAL_BIT

what on earth is this ...

Vinjul1704 commented 5 months ago

The problem seems to be

VkPhysicalDeviceMemoryProperties:
=================================
memoryHeaps: count = 3
        memoryHeaps[0]:
                size   = 2088763392 (0x7c800000) (1.95 GiB)
                budget = 614465536 (0x24a00000) (586.00 MiB)
                usage  = 0 (0x00000000) (0.00 B)
                flags: count = 1
                        MEMORY_HEAP_DEVICE_LOCAL_BIT
        memoryHeaps[1]:
                size   = 16814209024 (0x3ea347800) (15.66 GiB)
                budget = 15132000256 (0x385f00000) (14.09 GiB)
                usage  = 0 (0x00000000) (0.00 B)
                flags:
                        None
        memoryHeaps[2]:
                size   = 4294967296 (0x100000000) (4.00 GiB)
                budget = 2600468480 (0x9b000000) (2.42 GiB)
                usage  = 0 (0x00000000) (0.00 B)
                flags: count = 1
                        MEMORY_HEAP_DEVICE_LOCAL_BIT

what on earth is this ...

Hmm, let me check without the BAR limit workaround as well as rebar and/or above 4 decode disabled, to see if any of that makes a difference here. It's a 6GB GPU, so the first and last heaps look correct?

HansKristian-Work commented 5 months ago

It's possible the 4GB limit causes the heaps to be split, and this will confuse vkd3d-proton. We're trying to allocate from heap 0 here, which is exhausted quickly and fallback allocations don't seem to work well on this card. Try a 512M limit or 1G limit perhaps.

djdeath commented 5 months ago

I think it's exactly that, the lmem_bar_size option creates 2 heaps in the i915 driver (mappable/non-mappable) and that results in this.

You might have more chances putting a small lmem_bar_size=512

I think the root of the issue with rebar enabled is that i915 can deadlock itself trying to place things in VRAM. Some reports of this : https://gitlab.freedesktop.org/drm/i915/kernel/-/issues/11255

Vinjul1704 commented 5 months ago

It's possible the 4GB limit causes the heaps to be split, and this will confuse vkd3d-proton. We're trying to allocate from heap 0 here, which is exhausted quickly and fallback allocations don't seem to work well on this card. Try a 512M limit or 1G limit perhaps.

Yes, falling back to system memory when rebar is enabled and unlimited is an issue with Arc in general. In that case, usually the whole system would freeze.

I don't recall this specific Elden Ring issue happening in other DX12 games with VKD3D though (examples being Icarus, Helldivers and The Finals). Those usually work well, even when filling all the VRAM, as long as the 4GB limit is in place or rebar is disabled in general.

Here are the heaps with the limit disabled:

VkPhysicalDeviceMemoryProperties:
=================================
memoryHeaps: count = 2
    memoryHeaps[0]:
        size   = 6383730688 (0x17c800000) (5.95 GiB)
        budget = 5745147904 (0x156700000) (5.35 GiB)
        usage  = 0 (0x00000000) (0.00 B)
        flags: count = 1
            MEMORY_HEAP_DEVICE_LOCAL_BIT
    memoryHeaps[1]:
        size   = 16814211072 (0x3ea348000) (15.66 GiB)
        budget = 15132000256 (0x385f00000) (14.09 GiB)
        usage  = 0 (0x00000000) (0.00 B)
        flags:
            None

And here are the heaps with both the limit and rebar as a whole disabled:

VkPhysicalDeviceMemoryProperties:
=================================
memoryHeaps: count = 3
    memoryHeaps[0]:
        size   = 6115295232 (0x16c800000) (5.70 GiB)
        budget = 5272240128 (0x13a400000) (4.91 GiB)
        usage  = 0 (0x00000000) (0.00 B)
        flags: count = 1
            MEMORY_HEAP_DEVICE_LOCAL_BIT
    memoryHeaps[1]:
        size   = 16814211072 (0x3ea348000) (15.66 GiB)
        budget = 15132000256 (0x385f00000) (14.09 GiB)
        usage  = 0 (0x00000000) (0.00 B)
        flags:
            None
    memoryHeaps[2]:
        size   = 268435456 (0x10000000) (256.00 MiB)
        budget = 9437184 (0x00900000) (9.00 MiB)
        usage  = 0 (0x00000000) (0.00 B)
        flags: count = 1
            MEMORY_HEAP_DEVICE_LOCAL_BIT

I did also just test different configurations, but without luck. In particular, I tried with the limit and rebar both disabled, as well as rebar enabled and the limit set to 256 and 512.

All 3 of those cause the whole system to freeze either in the loading screen, or after a minute or two in-game. Opening the menu actually works in this case, but that doesn't really help if the game still freezes the whole OS.

For comparison, I tried the original configuration again, as in rebar on and 4GB limit, and just let the game run for half an hour. No crash or freeze during that, until the end when I tried to open the menu again.

Vinjul1704 commented 5 months ago

I think it's exactly that, the lmem_bar_size option creates 2 heaps in the i915 driver (mappable/non-mappable) and that results in this.

You might have more chances putting a small lmem_bar_size=512

I think the root of the issue with rebar enabled is that i915 can deadlock itself trying to place things in VRAM. Some reports of this : https://gitlab.freedesktop.org/drm/i915/kernel/-/issues/11255

Thanks, I wasn't sure if this issue here might be related to that rebar issue. The behaviour seems different in this case though, and I would have expected it to be fixed/worked around by having the 4GB limit in place. I guess not, so hopefully the kernel driver fixes help with this once they are done.