daniel-schuermann / mesa

Mesa 3D graphics library (mirror; no pull requests here please)
http://mesa3d.org
135 stars 3 forks source link

Benchmarks results #36

Open shmerl opened 5 years ago

shmerl commented 5 years ago

Just for visibility, this can be a meta issue for referencing various posted benchmarks.

grinceur commented 5 years ago

Strange Brigade, 1080p, Max settings, Arch Linux, linux 5.2.9.arch1-1, mesa-aco-git 19.2.0_devel.114724.fbaabd839e8-1 this is with amdgpu-pro-libgl 19.30_855429-1 and amdvlk-git r84.9b632ef-1 i7 6700k @ 4.6GHz, Radeon Rx 580 8g,

ACO: Capture d’écran de 2019-08-22 22-15-50

LLVM: Capture d’écran de 2019-08-22 22-18-00

AMDVLK: Capture d’écran de 2019-08-22 22-21-54

AMDGPU-PRO: Capture d’écran de 2019-08-22 22-25-14

pingubot commented 5 years ago

@daniel-schuermann i added amdvlk for the wolf 2 benches .

daniel-schuermann commented 5 years ago

@pingubot hm, a bit underwhelming.. and doesn't quite match my own testing. A quick check in 1080p gave me

Might be the different scene or resolution or if you installed both AMDGPU drivers, they sometimes mix up, and you were testing Pro instead of the Open Source one...

pingubot commented 5 years ago

@daniel-schuermann This is really interesting that radv/aco outperfroms amdvlk in your case.

I did not install the drivers, i just have unpacked amdvlk and amdgpu-pro packages and i am setting the drivers via the environment variable. I built radv/aco for myself.

Edit: I confirmed again, my results are correct. As mentioned, i am testing wqhd, fully gpu bound.

Which hardware are you using btw ?

daniel-schuermann commented 5 years ago

Vega64+i7-8700K Did you check that the json files point to the correct directory? I'm pretty sure that your AMDVLK icd driver just uses the Pro driver. Also open vulkaninfo and check for driverName = AMD open-source driver. I don't use the Pro driver anymore for exactly this behavior.

pingubot commented 5 years ago

Mhh, really strange then. Are you even gpu limited with your vega64 at full-hd ?

. amdvlk.sh vulkaninfo | grep -i drivername driverName = AMD open-source driver

echo $VK_ICD_FILENAMES /home/pingubot/user_driver/amdvlk/latest/data/etc/vulkan/icd.d/amd_icd64.json

cat /home/pingubot/user_driver/amdvlk/latest/data/etc/vulkan/icd.d/amd_icd64.json { "file_format_version": "1.0.0", "ICD": { "library_path": "/home/pingubot/user_driver/amdvlk/latest/data/usr/lib/x86_64-linux-gnu/amdvlk64.so", "api_version": "1.1.116" } }

I think everything is setup as it should be.

pingubot commented 5 years ago

Just tested an aco build a952333 in Doom.

GPU: Amd Vega 56 CPU: Intel Core i5 3570k@4.2ghz

Settings: WQHD, Ultra Ingame Test Spot

Amdgpu-pro 1930: 179fps Amdvlk v-2019.Q3.4: 168fps ACO: 165fps LLVM: 141fps

Fully gpu bound in all tests.

daniel-schuermann commented 5 years ago

Hmm... this is even less expected since ACO was on par with Pro in Doom since quite some time. Did you build ACO in debug mode? (default is debugoptimized.) Maybe checkout current master and rebuild in release mode? Also check for the GPU-Culling option ingame. I'll see tomorrow if I can confirm your findings, it would be regression probably because I don't expect AMDGPU to improve that drastically in this game.

pingubot commented 5 years ago

I am always doing my build that way:

meson --prefix $PWD/install.64 --buildtype release build.64 cd build.64 meson configure -Dc_args="-DNDEBUG" -Dcpp_args="-DNDEBUG" -D dri-drivers= -D gallium-drivers= -D vulkan-drivers=amd -D gles1=false -D gles2=false -D opengl=false ninja install

Sadly i didn't find a gpu-culling in the doom options menu, Is there a config file where i need to check ? (Wolfenstein 2 has such an option)

daniel-schuermann commented 5 years ago

I think, it's Wf2 only. Would you just try to bench with ACO 2-3 times? Not completely sure why, but might be something really odd going on which slows RADV down on the first run.

pingubot commented 5 years ago

I redid the Doom 2016 Benchmark 3 times with latest master as of now fc0fdb6 (with a light uv on my card). I cleaned the steam shader caches before i did the first run. All runs showed the same performance.

aco: 167 amd-pro 1930: 181 amd-pro 1850: 181

So pro 19.30 and 18.50 outperform aco easily. What is the recommended setting for gpu culling in Wolf 2 for aco ?

pingubot commented 5 years ago

As an additional info, i lost a few fps with latest master compared to my last test in Wf2. The mentioned in game test spot:

ACO a952333 : 124fps ACO fc0fdb6: 117fps

daniel-schuermann commented 5 years ago

There is something really odd going on. @BNieuwenhuizen can also reproduce ACO being slower than Pro in Doom, although being a bit faster than AMDVLK while for me ACO outperforms even Pro in Wf2 in FullHD (Doom being capped at 200FPS). Seems I have to buy a 4K monitor as somehow scaling doesn't work anymore for some reason. I cannot really believe that it's CPU-bound because LLVM and ACO differ so heavily: with Mein Leben settings ACO (backend branch): 190FPS RADV/LLVM: 147 FPS AMDVLK: 163 FPS AMDGPU-Pro: 175 FPS

pingubot commented 5 years ago

@daniel-schuermann i still have absolutely no clue how aco can outperform pro so much on your rig in wolf2. it is completely vice versa for me in 1440p. What are @BNieuwenhuizen results in 1440p or 1080p Wolf 2 ?

daniel-schuermann commented 5 years ago

@pingubot I'm confused as well! I actually swapped out my Vega for Polaris, and with this card my results are much more in line with yours Doom, 1080p, RX480: RADV/ACO: 122 FPS RADV/LLVM: 101 FPS AMDVLK: 126 FPS AMDGPU-Pro: 129 FPS This seems to change with higher resolutions, though. @pendingchaos got in 4k with Vega: RADV/ACO: 52.05 FPS RADV/LLVM: 42.39 FPS AMDVLK: 51.70 FPS AMDVLK (no app-profile): 49.38 FPS AMDGPU-PRO: 54.44 FPS Apparently, AMD made some improvements with this game over the last couple of releases, and we'll try to figure out if any of these can translate to RADV.

pingubot commented 5 years ago

@daniel-schuermann Thx for the results . For Doom the diff is there, but it is by far not as huge as in wolf 2. I will give Wolf 2 a try in 1080p and see how that looks like aco vs pro.

pingubot commented 5 years ago

Ok, Wolf 2 in 1080p is also a clear win for amd-pro, also gpu limited

Amdpro 1930:

Menu ~ 500fps Ingame ~ 205fps

Aco (backend):

Menu 400fps Ingame ~ 165

pingubot commented 4 years ago

No improvements with the yesterdays code drop 42225d73ef4ccda4a17de3f52a4ceae827e77de6

binarydepth commented 4 years ago

PoE was running with the 32-bit launcher so I did the testing again. Also tested Diablo 3

Path of exile no red shaders and quarry loads better but still there stuttering, this is an area I always test. I go to the quarry and enter the next zone from there. ACO 47 53 Oriath 58 66highgate 17 28 mobs 45 47 hideout

LLVM highgate 57 62 oriath 41 54 13 25 mobs 45 51 hideout

diablo

ACO no stuttering! login 109 111 tpwn 58 75 mobs 44 60

llvm login 99 104 town 69 81 mobs 40 56 (jumps up to 60 when mobs are cleared)

In the LLVM test and didn't notice stuttering either.

Proton, without complete shader pre-caching

ACO (no stutter except in the belly of the best)

Oriath 90 110 mobs 50 60

LLVM (heavy stuttering)

Oriath 95 104 dips into 50 when loading new assets mobs 33 55 it dips into 1 fps sometimes

Still not ready for leveling or maybe incursion and end game bosses. Got proton running and downloaded cache, still have to update it but maybe the downloaded yesterday still works.

baryluk commented 4 years ago

Overwatch (Wine 4.12.1 + DXVK 1.4.1) at 2560x1440 and EPIC settings, using AMD Fury X. GPU load at 100%.

About 8-20% improvement over LLVM10 (svn371317 and svn372920 for the 3rd graph).

overwatch_2560x1440_epic_highlight_1

overwatch_2560x1440_epic_highlight_2

overwatch_2560x1440_epic_highlight_3_smooth_0 9

The last one has about 1400 active graphics and 9 active compute pipelines at peak.

Same data without smoothing (just raw data), to show any spikes:

overwatch_2560x1440_epic_highlight_3_raw

Here is a representative frame from the highlight 3:

Screenshot at 2019-09-29 10-11-34

At smaller resolutions (1920x1080), and lower settings (HIGH), which I usually use, the improvement is between 3-5% only, sometimes none at all. But I sometimes see GPU below 100%, so it might be CPU limited:

overwatch_1920x1080_high_highlight_1

No graphical issues or differences between LLVM and ACO versions detected.

PS. CPU and GPU load parallel graphs coming. Also will be investigating shader compilation latency performance.

baryluk commented 4 years ago

Crysis 3, static view just after one of the early chekpoints, no differences in performance.

Fury X, 2560x1440, almost all settings at VERY HIGH.

crysis3_2560x1440_highest_view_1

Screenshot at 2019-09-30 15-30-07

Screenshot at 2019-09-30 15-33-00

baryluk commented 4 years ago

Crysis 2 Maximum Editon (32-bit version).

Fury X, 2560x1440, vsync disabled, almost all settings at EXTREME, with Objects settings at ULTRA.

No significant differences, but LLVM appears to have a slight edge:

crysis2_2560x1440_extreme_view_1

When running with all settings on ULTRA, the difference in performance is quite visible tho. My guess is the main difference is quality (number of iterations) when shading parallax occluding mapping.

crysis2_2560x1440_ultra_view_1

Example frames from LLVM and ACO at ULTRA:

Screenshot at 2019-09-30 18-39-47

Screenshot at 2019-09-30 18-29-24

Again at ULTRA, in the "Nanosuit showroom" menu (models are rotating automatically, so the slightly different screenshots are not an issue):

crysis2_2560x1440_ultra_nanosuit Screenshot at 2019-09-30 18-52-05 Screenshot at 2019-09-30 18-55-42

baryluk commented 4 years ago

Crysis 2, non-Maximum Edition, DX9.

ACO is slightly faster than LLVM here, but not much, and not always, but hard to judge.

DX9, using D9VK 0.22.

2560x1440, HARDCORE system settings.

crysis2_2560x1440_hardcore_view_1

crysis2_2560x1440_hardcore_view_1_long

For some reasons performance drops off slightly after first 60 seconds. Some D9VK issue probably.

Screenshot at 2019-09-30 20-53-55_llvm

Screenshot at 2019-09-30 21-05-42_aco

baryluk commented 4 years ago

EDIT: These results are inaccurate / invalid, and should be ignored. Two different resolutions and set of options were used for comparison, making the entire comparison invalid. The game likes to automatically change settings after restart, especially when changing to ACO! Leaving for historical reference.

Regression in Rise of The Tomb Raider. 64-bit native Linux port using Vulkan directly. Fury X, 2560x1440, vsync off, Settings: HIGH preset.

LLVM (average of 3 runs, with one pre-run ignored): Mountain Peak: 107.00 fps. Syria: 84.00 fps Geothermal Valley: 83.82 fps

ACO (average of 3 runs, with one pre-run ignored): Mountain Peak: 86.17 fps. Syria: 65.00 fps Geothermal Valley: 66.72 fps

Plus/minus 0.5 fps.

Here are some frame time graphs. The benchmark alignment isn't 100% reliable, as there is loading between 3 scenes (even if I have everything in tmpfs and in file cache), and for some reasons each scene length in time is dependent on the performance itself, thus indicating badly designed benchmark. Including never the less:

riseofthetombraider_2560x1440_high_benchmark

As you can see, LLVM is significantly faster.

The same can be seen even in the main menu: riseofthetombraider_2560x1440_veryhigh_menu

And with custom settings, with everything at very high, textures at high only, and pure hair off:

riseofthetombraider_2560x1440_custom_max_purehair_off_menu

pendingchaos commented 4 years ago

@baryluk: are you running with the NIR MRs included (the master branch on this Github repo includes them)? they can affect performance (though I'm not sure by this much with RotTR)

I ran Rise of the Tomb Raider recently, and performance was normal with ACO

baryluk commented 4 years ago

@pendingchaos Could you be more specific which MRs? I compiled it at commit 336b021d36 (Sep 28).

Are you talking about substraction lowering changes? I doubt they could have so big effect.

I will recompile with current master head, and be back in 30 minutes.

pendingchaos commented 4 years ago

https://gitlab.freedesktop.org/mesa/mesa/merge_requests/1664 and https://gitlab.freedesktop.org/mesa/mesa/merge_requests/1240

Yeah, the subtraction changes probably don't have such a big effect. I think the subtraction changes have been replaced with a now-merged MR anyway

baryluk commented 4 years ago

@pendingchaos These MRs are not merged yet, so obviously I am running without them. They are not in the master branch. I am using Mesa repos.

I retested at current mesa master HEAD ( git b9994cb8d5 ), and the issue is still there, ACO is ~30% slower than LLVM backend.

baryluk commented 4 years ago

@pendingchaos I am using Mesa git repo. I tried patches from 1664 and 1240 MRs, and they don't apply cleanly to current Mesa master branch. I will see what I can do.

Edit: Building with 1240 patch applied.

BNieuwenhuizen commented 4 years ago

So can you supply the display settings, (not just the graphics, which have "Very High" preset, but especially what antialiasing method you use ). Can you please also verify that the antialiasing method is consistent between the two compilers? (why would it differ? No clue except that I had the game automatically change it for me when I switched ACO on ...)

baryluk commented 4 years ago

@BNieuwenhuizen Hi Bas! :) Yeah, I noticed that changing between ACO and LLVM made the game sometimes restore some settings to default, but I just retested it again, with going into options and checking all options manually again to be the same, before doing measurements.

I will triple check it in a moment, and run with patch from MR 1240 applied.

baryluk commented 4 years ago

It appears the game did reset a resolution to 1920x1080 and FXAA when using ACO! The game also has tendendency to restart the AA options, as well few other options like Shadow Quality, and Sun Soft Shadows, Specular Reflection Quality, Dynamic Foliage and PureHair.

I made screenshots this time before, in-between, and after each benchmark, and rechecked them when combining the report. So the stuff below is accurate.

No more issues, ACO is faster! :)

Mesa 19.3.0-devel , git-b9994cb8d5; LLVM 10 svn372920

Plus manually applied patch from https://gitlab.freedesktop.org/mesa/mesa/merge_requests/1240

I do have few minor local changes running too, but nothing touching compiler passes or code gen.

DISPLAY:

Fullscreen: ON Resolution: 2560x1440 Refresh rate: 60Hz Anti-Aliasing: SMAA VSYNC: OFF

GRAPHICS: Preset: CUSTOM Texture quality: HIGH Anisotropic Filter: 16x Shadow Quality: VERY HIGH Sun Soft Shadows: VERY HIGH Ambient Occlusion: ON Depth of Field: VERY HIGH Level of Detail: VERY HIGH Tessellation: ON Screen Space Reflections: ON Specular Reflection Quality: NORMAL Dynamic Foliage: HIGH Bloom: ON Vignette blur: ON PureHair: OFF Lens Flares: ON Screen Effects: ON Film Grain: ON

In-Game benchmark results:

LLVM 1st run: Mountain Peak: 76.09 FPS (min: 45.22, max: 118.81) Syria: 56.13 FPS (min: 24.17, max: 73.23) Geothermal Valley: 50.66 FPS (min: 41.08, max: 64.67)

LLVM 2nd run (without game restart): Mountain Peak: 76.25 FPS (min: 38.84, max: 117.04) Syria: 56.73 FPS (min: 24.77, max: 126.21) Geothermal Valley: 50.82 FPS (min: 42.24, max: 63.81)

ACO 1st run: (identical settings to LLVM runs) Mountain Peak: 81.06 FPS (min: 47.48, max: 120.50) Syria: 60.18 FPS (min: 28.87, max: 109.08) Geothermal Valley: 52.58 FPS (min: 40.32, max: 68.78)

ACO 2nd run (without game restart): Mountain Peak: 80.18 FPS (min: 46.36, max: 124.56) Syria: 59.70 FPS (min: 26.02, max: 92.26) Geothermal Valley: 52.48 FPS (min: 41.46, max: 69.04)

riseofthetombraider_2560x1440_custom_max_purehair_off_menu_try3_good

baryluk commented 4 years ago

Re-run benchmarks on Rise of The Tomb Rider.

Fury X, 2560x1440, FXAA, All settings at max possible values.

DISPLAY:

Fullscreen: ON Resolution: 2560x1440 Refresh rate: 60Hz Anti-Aliasing: FXAA VSYNC: OFF

GRAPHICS: Preset: CUSTOM Texture quality: HIGH Anisotropic Filter: 16x Shadow Quality: VERY HIGH Sun Soft Shadows: VERY HIGH Ambient Occlusion: ON Depth of Field: VERY HIGH Level of Detail: VERY HIGH Tessellation: ON Screen Space Reflections: ON Specular Reflection Quality: VERY HIGH Dynamic Foliage: HIGH Bloom: ON Vignette blur: ON PureHair: VERY HIGH Lens Flares: ON Screen Effects: ON Film Grain: ON

riseofthetombraider_2560x1440_custom_all_max_benchmark1

ACO slightly faster. Interestingly, in one of the runs LLVM was faster during one segment of the benchmark.

baryluk commented 4 years ago

Shadow of The Tomb Raider Trial, 64-bit. Proton 4.11-6 (DXVK 1.4-5).

Fury X, 1920x1080, No AA, No AO.

Menu:

shadowofthetombraider_1920x1080_custom_menu

Benchmark:

shadowofthetombraider_1920x1080_custom_benchmark2

Fury X, 2560x1440, No AA, AO: BTAO.

Menu:

shadowofthetombraider_2560x1440_custom_menu

baryluk commented 4 years ago

More of the Shadow of The Tomb Raider,

LLVM has a bit of an edge in some parts of the benchmarks.

Fury X, 2560x1440, No AA, Shadow quality: Ultra, AO: BTAO, Benchmark:

shadowofthetombraider_2560x1440_custom_benchmark1

Yeah, there are spikes like that every second, especially in the 2nd segment of the benchmark. Data on graph are already heavily smoothed, and they still show up. In reality, they are even bigger:

shadowofthetombraider_2560x1440_custom_benchmark1_raw

Why the next frame after each spike is actually faster than average trend, is beyond me at the moment. My guess is that it has something to do with triple buffering, or unreliable timing in mesa overlay.

Screenshot at 2019-10-01 16-34-46

baryluk commented 4 years ago

Hey, do you need a shader dumps of any of these games for testing or CI maybe (not sure about licensing, but I think it can be done)? I can dump them using vkpipeline-db or fosilize. Some of them are big.

aufkrawall commented 4 years ago

radv-aco unfortunately performs worse vs. amdvlk-pro in Wolfenstein 2 than in Doom. In Roswell streets, I'm getting: radv-aco: 66fps amdvlk-pro: 82fps

1440p RX 570 OC

pingubot commented 4 years ago

Pro sadly outperforms aco in every vulkan windows game which i could test so far: Doom, Wolfenstein, Strange Brigade.

daniel-schuermann commented 4 years ago

Yeah, the issue with Doom and Wf2 is known, but Pro cheats on both games by doing some imprecise arithmetic optimizations.

aqxa1 commented 4 years ago

Some quick tests with Kingdom Come : Deliverance (DXVK) run with a 5700 XT card and latest aco-navi branch, compared to other AMD drivers:

RADV LLVM10 git, 73.8 FPS RADV LLVM10 git

AMDVLK 2019 Q3.6, 82.4 FPS AMDVLK

AMDGPU-PRO 19.30-855429, 83.2 FPS AMDGPU-PRO

aco-navi git, 86.7 FPS aco-navi

Summary

So, aco-navi in this scene is 17.5% faster than RADV, and 4% faster than AMDGPU-PRO

EDIT: Unrelated to ACO but there looks to be a sizeable drop in performance (looks to be CPU overhead) between kernel 5.3.7 (plus some bugfix patches) and amd-staging-drm-next (with stable 5.3.7 updates merged in). The drop is also there with 5.4-rc3. I'm not sure why it's occurring yet, though:

5.3.7, ACO, 79.4 fps: 5.3.7

amd-staging-drm-next + 5.3.7 updates, ACO, 61.8 fps amd-staging-drm-next

So 79.4 fps -> 61.8 fps = 17.6% drop in performance. You can also see that GPU load changes from 100% -> 79%.

EDIT 2: Performance regression is due to 828d6fde7f574d74b0a6a591345d3c42b62d5e21: drm/amdgpu/psp: move TMR to cpu invisible vram region

Actually, it does appear to be more noticeable with ACO. With LLVM performance is largely the same (albeit more variable with the commit enabled). I suspect there is some GPU utilisation issues going on, in that it might be clocking up and down too aggressively (or not aggressively enough). I seem to remember people reporting issues with this on Windows too with Navi cards, so maybe the above commit actually makes the card more efficient, but reveals another underlying issue.

pingubot commented 4 years ago

@daniel-schuermann thx for the Info about the cheating. But is that really causing a 25% diff ? Just tested Wolf 2 again with latest master, and in my testspot i have 137fps with aco and 172fps with -pro.

Venemo commented 4 years ago

@aqxa1 Have you reported that regression to the kernel people yet? If yes, can you give us a link to the bugreport please?

aqxa1 commented 4 years ago

@Venemo Here you go: https://bugs.freedesktop.org/show_bug.cgi?id=112124

shmerl commented 4 years ago

TW3 with aco is slightly behind llvm still on Navi (Sapphire Pulse RX 5700XT). Resolution: 2560x1440, max settings, hairworks off. Mesa master, with llvm 10: driverInfo = Mesa 20.0.0-devel (git-a2689ebcd6) (LLVM 10.0.0)

aco: tw3_aco

llvm: tw3_llvm

Save in that location: tw3_save.zip

In heavier GPU load location (like in Velen during heavy rain), they are almost the same, with framerate somewhere around 60+ fps. See here. Still, supposedly ACO should be a it better?

cc @Venemo

Venemo commented 4 years ago

@shmerl Thanks for the benchmark. If I'm reading your screenshots right, ACO yields 80.5 fps, and LLVM gives you 81.1 fps, which is a 0.75% increase. I'd say this is not something to worry about; it looks like either compiler can give you a very decent game experience.

shmerl commented 4 years ago

I agree, there is nothing wrong with the experience, I was just comparing it to claims that ACO beats llvm in raw framerate posted by Phoronix. I suppose in this case it doesn't (yet?).

BNieuwenhuizen commented 4 years ago

@Venemo That is actually a 0.75% increase.

Venemo commented 4 years ago

@BNieuwenhuizen Yep, you have a good eye, I corrected it.

@shmerl I haven't looked at TW3 shaders in detail yet, so can't say what the problem is (if there is one). Currently we don't have Navi-specific optimizations in ACO yet, but it's on my TODO list (right after wave32). If the same result applies to other hw generations then it's also possible that TW3 has a pattern that we don't optimize yet.

aufkrawall commented 4 years ago

In my last test on Polaris a few weeks back, radv-aco was both faster than radv-llvm, amdvlk-open and amdvlk-pro. Both amdvlk drivers also had their share of visual corruption, while radv had not.

Edit: Talking about The Witcher 3.

shmerl commented 4 years ago

@Venemo: In my previous tests on Vega 56, aco was ahead of llvm. But I also run with lower resolution than now using a different monitor. Resolution can play a role potentially.

I can try running a test with lower one using Navi.