HansKristian-Work / vkd3d-proton

Fork of VKD3D. Development branches for Proton's Direct3D 12 implementation.
GNU Lesser General Public License v2.1
1.8k stars 185 forks source link

CPU optimization exploration tracker #1022

Open HansKristian-Work opened 2 years ago

HansKristian-Work commented 2 years ago

The idea of this issue is to explore CPU optimizations in vkd3d-proton. For a game to be considered here it should be CPU bound with significant API overhead, i.e., we can meaningfully improve game performance through perf tuning on our end.

Information needed:

Monster Hunter: Rise (1446780)

Details TBD

Oschowa commented 2 years ago

Monster Hunter: Rise saw large improvements recently with the descriptor copy optimizations (90 -> 100 fps) and there are even more gains with the descriptor punchthrough path (100 -> 110 fps). However, it is still cpu limited. The area that's most cpu limited it right after you start the first hunt when looking into the distance:

Screenshot ![Screenshot from 2022-03-04 14-50-52](https://user-images.githubusercontent.com/8129300/156775955-042e9694-c9ed-48b4-890b-1b7ec0b8d326.png)

To get to it, start a new game and mash through the tutorial in the village until the quest-giver allows you to take the first hunt.

Oschowa commented 2 years ago

DEATH STRANDING is also a good candidate to look into, it's usually cpu-bound with descriptor copies high in perf top, especially in later areas, but even right after starting the game, especially at lower resolutions:

Screenshot ![Screenshot from 2022-03-04 15-03-36](https://user-images.githubusercontent.com/8129300/156777654-3e78e597-e08d-4326-833a-39b549d1a52e.png)

(Savegames for this game are not share-able and it takes forever to reach later, more cpu-bound, areas.)

kermeat commented 2 years ago

Control (870780)

Almost everywhere. I can first see it in the first scene with janitor. But it's almost never go below 50 fps

Screenshots ![control3](https://user-images.githubusercontent.com/5646358/162254537-c338e0da-2967-4e31-9124-9be6c0b6594d.png) ![control1](https://user-images.githubusercontent.com/5646358/162253150-21134f91-bf19-4947-9c0f-5adc7ef43b9a.png) ![control2](https://user-images.githubusercontent.com/5646358/162256017-a1b7b2f4-5318-4fdf-b8b7-7f9074057c32.png)
doitsujin commented 2 years ago

What exactly is the issue with Control? CPU limit doesn't necessarily mean that our code runs into an obvious bottleneck and performance should generally be good in that game, assuming reasonable hardware and a non-borked wine configuration.

kermeat commented 2 years ago

What exactly is the issue with Control? CPU limit doesn't necessarily mean that our code runs into an obvious bottleneck and performance should generally be good in that game, assuming reasonable hardware and a non-borked wine configuration.

I did some tests on windows and there is huge difference with vkd3d-proton in some scenes. On windows even with low configuration and render resolution 960x540 performance always was limited by gpu. On linux with same config 2 times lower fps and 45% gpu load

Windows dx12 ![control_dx12 2022-04-10 11-40-22](https://user-images.githubusercontent.com/5646358/162621672-5caf7136-5840-4f51-a0a1-7a133150c82d.png)
Proton dx12 ![control4](https://user-images.githubusercontent.com/5646358/162621983-170c1d6b-0037-4611-a214-047180d4dc44.png)
Proton dx11 ![control5](https://user-images.githubusercontent.com/5646358/162622222-9ba4f438-3f1c-44c4-b6c9-f63c295aebb4.png)
doitsujin commented 2 years ago

Try VKD3D_CONFIG=no_upload_hvv maybe. Differences that huge are normally not caused by optimization issues.

Also, please mention your hardware when complaining about performance...

kermeat commented 2 years ago

Hardware info

VKD3D_CONFIG=no_upload_hvv has no visible effect on performance. My GPU (RX 590) have 8G VRAM. I see a direct correlation between fps and CPU frequency in this scene. With maximum frequency (3.3) - 79 fps (143 fps on windows with same max freq but default governor) With frequency fixed at 3.0 - 71 fps With frequency fixed at 2.5 - ~50 fps And just for comparison dx11 version with frequency fixed at 1.2 - 120 fps with 100% GPU load (dx12 - 25 fps ) Tests were conducted with performance governor: cpupower frequency-set --governor performance cpupower frequency-set -f <freq>

Setting governor to default(schedutil) leads to low unstable fps from 40 to 53

It's definitely CPU bound and doesn't appear on windows or with dx11 version with proton. I can check performance on Windows with limited CPU frequency if it helps.

UPDATE: On windows minimum render resolution available 720p. With balanced power settings (max freq 3.3) - 143 fps With CPU frequency fixed at 1.2 - 65 fps

gmbeard commented 1 year ago

I have similar issues to @kermeat with Control; DXVK appears to give much better performance than VKD3D. Certain areas of the map seem to be CPU bound with VKD3D, dropping GPU usage down to 40 - 50% (FPS drops to 40-50 accordingly). I don't get this with DXVK.

Hardware Info

Both the below screenshots are captured using GE-Proton7-37, with exactly the same graphical options set, at native1440p.

Launch options for VKD3D: PROTON_ENABLE_NVAPI=1 VKD3D_CONFIG=dxr11 mangohud %command% -skipStartScreen -dx12 VKD3D

Launch options for DXVK: PROTON_ENABLE_NVAPI=1 mangohud %command% -skipStartScreen -dx11 DXVK

K0bin commented 1 year ago

Spider-Man: Remastered (1817070)

Spider-Man: Miles Morales (1817190)

Those two seem to be by far the most CPU heavy games with vkd3d-proton, at least on Nvidia GPUs.

Test case

20221128232821_1

Sitting on a lantern in the middle of Times Square in Miles Morales. Settings: Maxed out, including ray tracing. With the exception that the RT distance is kept at the middle setting which is the default. Resolution doesn't matter as it's CPU limited in all cases.

Results

Unfortunately Windows is 1.3x as fast as the fastest result I got on Linux.

VKD3D profiling result: milesprofiling.txt

As text, sorted by ticks: milesprofiling.txt