IGCIT / Intel-GPU-Community-Issue-Tracker-IGCIT

IGCIT is a Community-driven issue tracker for Intel GPUs.
GNU General Public License v3.0
115 stars 4 forks source link

Arc driver overhead problem is severely revealed in Horizon Zero Dawn #585

Closed Susie1818 closed 2 months ago

Susie1818 commented 11 months ago

Checklist [README]

Game / Application [Required]

Horizon Zero Dawn

Game Platform [Required]

Other game platform

No response

Processor / Processor Number [Required]

intel Core i5-13500

Graphic Card [Required]

intel Arc A770

GPU Driver Version [Required]

31.0.101.4826

Rendering API [Required]

Windows Build Number [Required]

Other Windows build number

No response

Intel System Support Utility report

SSU_20231105.txt

Description and steps to reproduce [Required]

The average "CPU" (not "GPU"!!) frame time increased drastically from 6.37ms to 11.49ms by merely changing the GPU from an RTX3080 to an Arc A770. Theoretically, the CPU frame time should remain unchanged when you only change the GPU in the system. Obviously the Arc's driver was dragging down the CPU severely, causing the game engine to be able to issue much fewer frames per second.

Game graphic quality [Required]

Game resolution [Required]

1920x1080

Game VSync [Required]

Off

Game display mode [Required]

Detailed game settings [Required]

The "Original" graphics settings preset

Device / Platform name

No response

Crash dumps [Required, if applicable]

No response

Save game

No response

freak2fast4u commented 11 months ago

To drive the point further, I'm sad to report that even though I'm happy with my experience in Counter-Strike 2 (a DX11 game), with an Arc A770 and a Ryzen 7800X3D, I still have more FPS with my old 5700XT at the same settings.

However, you mention using some old (4826) drivers. There have been several releases since : image

Grab them here : https://www.intel.com/content/www/us/en/download/785597/intel-arc-iris-xe-graphics-windows.html

Admittedly, none of the associated release notes mention a performance uplift in Horizon Zero Dawn, but it's possible that optimizations made for other games trickle down into others (drivers usually have common/shared code paths). Did you give the latest drivers a try ?

Susie1818 commented 11 months ago

@freak2fast4u Driver 4885, 4887, 4900, and 4952 all have the same severe bug that causes video playback flickering and blackouts, so they are not usable.

freak2fast4u commented 11 months ago

@freak2fast4u Driver 4885, 4887, 4900, and 4952 all have the same severe bug that causes video playback flickering and blackouts, so they are not usable.

So that's what's i've been facing for a while ! Thanks for the heads up, will cross-check this asap :)

Edit : those blackouts, it turns out I also have them in linux, so it could be a VBIOS issue. I say this because once I've installed the 4952 drivers I'm forcibly using vbios 1068 no matter what version of the driver or OS I'm running (no way to downgrade that I know of). I'm also not ruling out an "end of life" situation for my screen, but seeing it's a more general issue, I'll skip on that test for now (end of off-topic).

Nidzhun commented 10 months ago

Can confirm this, with i7 12700k and ddr5 6200mhz arc a770 couldn't utilize in this game more then ~80%. With 4953 driver.

Vivek-Intel commented 10 months ago

@Susie1818 can you share game benchmark result on latest driver 101.5074 with both A770 and 3080 respectively to know CPU bound FPS difference specially.

Susie1818 commented 10 months ago

@Vivek-Intel

I have sold my RTX3080, but I can still provide you with some useful information:

i5-13500 + A770 (driver 101.5074) = CPU 85 FPS avg / GPU 145 FPS avg (1920x1080, "Original" graphics quality) i5-12400 + RTX4070 (driver 546.29) = CPU 187 FPS avg / GPU 197 FPS avg (2560x1440, "Original" graphics quality)

Old data: i5-13500 + A770 (driver 101.4123) = CPU 87 FPS avg / GPU 103 FPS avg (2560x1440, "Original" graphics quality) i5-13500 + RTX3080 (driver 528.49) = CPU 186 FPS avg / GPU 214 FPS avg (2560x1440, "Original" graphics quality) i5-12400 + A770 (driver 101.4123) = CPU 75 FPS avg / GPU 95 FPS avg (2560x1440, "Original" graphics quality) i5-12400 + RTX3080 (driver 528.49) = CPU 187 FPS avg / GPU 212 FPS avg (2560x1440, "Original" graphics quality)

Vivek-Intel commented 9 months ago

@Susie1818 I ran in game benchmark in HZD and could see A770 avg perf is par with RTX 3060 and current FPS numbers are expected perf number that you see with A770 in both CPU and GPU FPS numbers.

Horizon Zero Dawn_ Complete Edition 12_15_2023 3_06_13 AM

3060

Susie1818 commented 9 months ago

@Vivek-Intel

A770 avg perf is par with RTX 3060

I don't think so.

First of all, on my i5-13500 system, the performance of A770 apparently falls behind your i9-14900K/RTX3060 system.

Secondly, the GPU utilization of my A770 is quite low throughout the benchmark sequence, staying within 50-ish% range most of the time with only few spikes of 6x% or 7x%.

And then, if this is a very CPU-bound situation, how come my i5-12400/RTX4070 system can achieve 224% of performance score (average FPS) compared to your i9-14900K/RTX3060 system, especially considering the GPU performance of RTX4070 is only 183% of RTX3060?

This is not a simple task. You have to do some deep-dive research to find out the real bottleneck(s) hindering the performance.

YaGCBS4 - Imgur

Screenshot (467)

Susie1818 commented 9 months ago

@Vivek-Intel

From the test I did back in February, you can clearly see that Arc GPU's performance is CPU dependent but Nvidia GPU's performance is not CPU dependent. How do you explain this phenomenon?

freak2fast4u commented 9 months ago

@Susie1818 : the only way I see to distinguish between a CPU-limited game engine and a driver overhead issue, is to measure the GPU busy times and the frame times together, and you can use PresentMon overlay and/or CapframeX (in capture mode) to demonstrate. If the curves are far appart from each other, then there is a driver overhead. If the curves are close to each other, then there is little or no overhead. And I'm skeptical of that "CPU FPS" metric HZD is showing, it might only make sense when related to input lag or physics accuracy, but not framerate.

Also I believe your i5-13500 is even a good price/performance match for the A770, it's a perfectly valid use case.

I don't own that game and I have a 7800X3D, so I can't help any further, sorry for that.

However, I agree that testing driver overhead with a top-of-the-line CPU that can by definition support that overhead transparently is ludicrous ... oO

@Vivek-Intel : surely you have mid-range CPU lying around for this specific scenario right ?

On the other hand those are perfectly playable framerates ...

Vivek-Intel commented 9 months ago

We try to use similar systems and settings to compare results and avoid differences causing any deltas but I have run the benchmark on another 13th gen CPU that I have (unfortunately its high end - i9-13900K) and still can see results are on par with RTX3060 even when it is used with i9-14900K.

This game uses the CPU to read a lot of memory from GPU local memory during ordinary logic updates every frame and logic updates generally is a CPU task which might be contributing to the CPU FPS in the game so we can not absolutely be sure CPU FPS number shown in game should be completely independent of GPU in the system or should remain same if we just change the GPU.

Please know that the Perf number seen with latest driver on A770 is expected perf as compared to competition.

a770_i9_13900_1440p_original

3060_i_14900k_1440p_origin

Susie1818 commented 9 months ago

@Vivek-Intel

I can't believe that an engineer here at GitHub doesn't show any passion to solve the problem but keeps shirking responsibility.

I don't know why on earth a professional like you need some ordinary person like me to tell you where the problem is.

Go test your RTX3060 with an i9-14900K and and an i5-14600K, with the same graphics settings, and run both 1080p and 1440p resolution. You will get 4 outcomes. Then you repeat the same tests on an Arc A770. You will also get another 4 outcomes. Then tell us what insight you obtain from comparing these 8 results.

If you love your Arc GPU product, you would not have told me

the Perf number seen with latest driver on A770 is expected perf as compared to competition.

Your BS words only revealed the cloven foot that you have no love in your product. You don't have the love that we first-gen adopters/unpaid beta testers/enthusiasts have. What I want is NOT the A770 being "on par with competition." What I want is a flawless driver that unleashes its fullest potential without any discernible problem. Haven't you heard what intel TAP Tom Petersen said? He said that the Arc's driver improvement is a "labour of love". I am telling you to improve your product, and you just replied "it is good enough and doesn't need to be improved." What the $&#*...... Are you an engineer or just a customer service representitive?

IGCIT commented 9 months ago

@Susie1818 please be respectful.

If you don't agree with an answer you get, or, it is not what you expected, you can express your disappointment, but respecting other users, especially Intel employees, that are here to assist people with their issues.

IGCIT was made to help people communicate with Intel more easily, and we are grateful for the progress we have made so far, so this behaviour is not tolerated.

This is a warning.

freak2fast4u commented 9 months ago

@Susie1818 : harassing a helpdesk employee out of frustration won't magically spawn a better driver into existence, it can only hurt the overall outcome. You both brought some data to the table, all you can do is input more data in the hopes of actually concluding if there is or isn't excessive driver overhead in this game.

EstebanIntel commented 9 months ago

Hi @Susie1818,

We appreciate your passionate support of this product and all your effort reporting issues you found so we can improve it. We are committed taking all feedback seriously, as all constructive criticism will help us deliver better products.

As a newcomer to the GPU market, we have the huge task of optimizing for the are thousands and thousands of games out there. And each one of those games is optimized differently by the developer and who puts different workloads on the CPU and GPU. So, please understand that we need to set a baseline to compare against competing products on each of these existing games, in order to properly allocate our limited resources on this huge optimization effort. With this in mind, our current target is for the A770 to be on par (performance wise) with the RTX3060. Once we can comfortably say that the performance of the A770 is hitting the current target for the vast majority of games out there, we may re-evaluate our current target. But please understand that given the enormous number of existing games this is huge effort with our limited resources, so this will take some time.

Susie1818 commented 9 months ago

@EstebanIntel

First of all, in this game Arc is not on par with RTX3060. If your definition of "on par" is the merely framerate, then you should use an A750 rather than an A770 because that's what intel advertises to the public. To me, "on par" doesn't only mean framerate, it also means "less problematic or at least equally problematic".

You guys are engineers and you guys should know better than I do. The title of this thread/ticket is about "driver overhead." There is obviously some problem with Arc here. Its behavior is abnormal. I don't know why you guys as engineers are not as hyper-enthusiastic about digging out and solving problems as I expect. @Vivek-Intel has spent his time and put his efforts in some tests, but at the end of the day he didn't seem to get the idea that there is some problem with Arc here. Then I think his time and efforts were not so meaningful and somehow wasted. Why did he want to prove that Arc's framerate is on par with an RTX3060 in the first place? I started this ticket not because of a complaint about comparison between A770 and RTX3060 from the very beginning. I started this ticket because I wanted to point out an abnormal behavior of Arc GPU.

From what I have tested, Nvidia GPUs show no CPU dependency in this game. Performances are basically identical with different CPUs. In contrast, Arc GPU shows very obvious CPU dependency here. In addition, Arc GPU has almost no scalability with the resolution reduction from 1440p to 1080p. To me, these two abnormalities are not acceptable. If you argue that this is a CPU-bound situation, then there are contradicting evidences that Nvidia GPUs obviously scale with resolution and GPU performance (3080/4070 results), which should not have happened if it were truly a CPU-bound situation.

Therefore, there must be something wrong in Arc's driver taking up excessive CPU resources and competing with the game engine's redering processes. If you guys admitted that Arc's driver does consume much more CPU than its competitors and you guys are more than willing to improve it, then I would not have been so irritated.

As an end user, I think the behavior of an Nvidia GPU is totally fine, normal and not problematic in this game, while the behavior of an Arc A7 is abnormal and problematic. From this perspective, Arc is not on par with its competitor.

Susie1818 commented 9 months ago

@IGCIT

please be respectful.

If you don't agree with an answer you get, or, it is not what you expected, you can express your disappointment, but respecting other users, especially Intel employees, that are here to assist people with their issues.

I didn't start this thread/ticket with disrespect. However, if intel emplyees that are to assist people here cannot communicate with people on point, I would feel my time and efforts and passion are disrespected. I don't know how you would feel if you were in my boots. You yelled for help, and then the helper came and told you, "there is no problem at all; stop yelling." To me it's like slapping in your face and saying "shut up." This is exactly what I felt after reading his replies. I think it is actually better if nobody ever came to answer and just let me drown in my disappointment alone and just let this ticket fade away and vanish with time. If you really want to assist someone, you are supposed to be standing on the same side rather than acting like defending yourself from "attacks" from someone.

IGCIT was made to help people communicate with Intel more easily, and we are grateful for the progress we have made so far, so this behaviour is not tolerated.

I appreciate IGCIT because from my experiences you guys here have solved more problems and in a faster way than the official Intel® ARC™ Graphics Community Forum where access is provided directly via the Arc Control interface. GitHub is more like a niche place, I think. I don't know what Public Relations policies intel holds, but I think the official support channel there is much more disappointing than here. It's almost like flooded with mostly shallow problems and complaints there. If I were an intel "Customer Support Technician" working there, I could probably have been driven crazy already. On the other hand, when I was submitting issues there during the early days I started using the Arc GPU, I was really driven mad by those customer service representitives as well. I don't know what the best approach is, but I do hope you guys can always establish better and better communication with end users.

Susie1818 commented 9 months ago

@freak2fast4u

Thanks for the advices! You know, recently I have spent about US$ 1000 to gather necessary parts and I am about to build another PC for testing and verifying Arc's overhead issue. I am almost like insane and obsessed with Arc and torturing myself with it.

Karen-Intel commented 9 months ago

Hey @Susie1818 just a few comments on your last replies:

First: Thank you for your appreciation towards this forum/channel and the community which has helped us build a great place = is your first choice to report ARC strange behaviors. It means a lot to us and be sure we're doing our best to provide the adequate customer support our community deserves.

2nd: We are on the same side, we just need to find common ground on this and other reports. We never take things personally and we'd appreciate if you did the same. This, to continue collaborating like what we are: a community.

3rd: We are committed to review each case with quality and transparency and just as a clarification, we're the front line engineers and the right channel to share valuable cases with the development team. Every report has internal stages that do take time and thanks to you guys we have so many! Be sure that we do many experiments that must be justified along our internal analysis process.

That being said, allow us to dig into this and will update this thread soon, appreciate your patience in advance :)

Karen

EstebanIntel commented 9 months ago

Hi @Susie1818,

Can you please detail how to are measuring "CPU frametime"?

My guess is you have already seen this video: https://www.youtube.com/watch?v=tZHHhTt_fww&t=377s

Basically, most framerate meters are measuring "CPU frametime", since they are measuring the time from the CPU Present of frame 1 to the CPU Present of frame 2. Then, framerate is calculated as the inverse of frametime. For example, a frametime of 16.666 ms (between CPU presents) is displayed as a framerate of 60fps.

Now, "GPU frametime" can be defined as the time the GPU takes to render a single frame. This is not usually measured by framerate meters. But we have added it to presentmon as "GPU Busy".

So, using presentmon to measure both (CPU) Frametime and GPU Busy on a 11900K+A750 running Horizon Zero Dawn I get this: image

From this test I see some things I want to hightlights:

  1. (CPU) Frametime is 13.7ms
  2. This matches the framerate shown by PresentMon: 1/0.0137 = 73 fps
  3. GPU utilization is aprox 77%
  4. GPU Busy time is 10.6ms
  5. GPU Busy is lower than Frametime, which means the CPU is the bottleneck in this test scenario; and this is why the GPU is not at full utilization.

Now, this scenario where GPU Busy is lower than Frametime can be caused by 2 possible reasons:

  1. The game is highly dependent on CPU and my 11900K cannot keep up
  2. The Intel drivers is not fully optimized, causing extra load on the CPU

In this case it's probably a combination of both these reasons. However, I think it's more because of reason 1 than reason 2. This is because our comparison target (RTX3060) shows even lower framerates; and also, there are multiple reports of this game been CPU heavy online:

https://steamcommunity.com/app/1151640/discussions/0/2914346777805981906 https://www.reddit.com/r/horizon/comments/lv0rkd/horizon_zero_dawn_very_high_cpu_usageis_this/ https://steamcommunity.com/app/1151640/discussions/0/3821820927599544042/ https://www.reddit.com/r/horizon/comments/s64q96/high_cpu_usage_on_pc_is_there_a_fix_for_this/

Arturo-Intel commented 9 months ago

@Susie1818 FYI our driver engineers are aware of this behavior, the internal report numbers are: 16022778623 and 14017816271.

If I have any news, I will share them through this thread.

--r2

Susie1818 commented 9 months ago

Hi @EstebanIntel ,

Can you please detail how to are measuring "CPU frametime"?

The inverse of CPU framerate: 1000 divided by CPU FPS

Now, this scenario where GPU Busy is lower than Frametime can be caused by 2 possible reasons:

  1. The game is highly dependent on CPU and my 11900K cannot keep up
  2. The Intel drivers is not fully optimized, causing extra load on the CPU

In this case it's probably a combination of both these reasons. However, I think it's more because of reason 1 than reason 2. This is because our comparison target (RTX3060) shows even lower framerates; and also, there are multiple reports of this game been CPU heavy online:

Arc A7 is intrinsically more performant than RTX3060, so it is meaningless to compare merely the framerates between them.

If you think it's the 11900K being the bottleneck, you can try pairing it with a higher-end GPU such as an RTX3080 or 4070, and then you'll find it is not, because the result will scale well with GPU capability. On the other hand, you can also try using a 14900K CPU with an RTX3060, and you'll find there is little improvement compared to your 11900K(+RTX3060) result.

Although this game is very CPU intensive, the phenomenon of CPU bottleneck is only impactful on Arc A7 GPUs but not on Nvidia Ampere or Ada Lovelace GPUs (nor on AMD RDNA2/3 GPUs). This is the most important point.

Susie1818 commented 9 months ago

@EstebanIntel

Arc A7 GPU relies too much on CPU to "drive" it. This disadvantage is magnified and revealed by CPU-intensive games. With less CPU-intensive games, which are more common, this shortcoming is probably well hidden. So, thanks to CPU-intensive games.

Susie1818 commented 9 months ago

@EstebanIntel

This video is very informative for understanding the driver overhead issue of Arc GPU.

Nidzhun commented 9 months ago

@EstebanIntel

This video is very informative for understanding the driver overhead issue of Arc GPU.

You are right with arc's driver overhead. Developers of arc drivers must optomize it overall, not for every single application differently. May be they do it now additinally to all this minor ennhancements.

Susie1818 commented 9 months ago

@EstebanIntel @Arturo-Intel

Hey guys, I recently found a simple and clear evidence that proves Arc driver has severe driver overhead issue. Just few days ago I found that actually there is an "API overhead feature test" in the 3DMark benchmark suite.

Here are my results: Screenshot (167) Screenshot (472)

This particular test is not regarding GPU computing capabilities. It's testing the driver efficiency for various graphics rendering APIs. Theoretically if the driver efficiency is the same on both sides, an i7-13700K as a faster CPU is supposed to issue a larger amount of draw calls than an i5-12400. However, as you can see, the intel Arc Alchemist architecture driver driven by an i7-13700K turns out to be much more inefficient than the Nvidia Ada Lovelace architecture driver driven by merely an i5-12400.

I hope you guys Arc driver team can do some research into this issue on an overall standpoint, not just doing optimizations on a per-game basis. What is the root cause and the biggest cumbrance of the driver inefficiency? Why does the driver take up so much CPU time to make a draw call? Is the CPU really computing something necessary? Or is it actually just idle waiting for something holding the process?

Karen-Intel commented 9 months ago

Hey @Susie1818 thanks for sharing your latest experiments! Our dev team is already working on this case and we're making sure to share this information with them :) Any update we will let you know! Let's hope we get one soon

Karen

Susie1818 commented 9 months ago

@EstebanIntel @Arturo-Intel @Karen-Intel

In addition to the overall inefficiency, another aspect worth noting is the gain of DX11 MT vs ST. According to the 3DMark API overhead tests shown above, Ada Lovelace driver has 66% gain from MT, but Arc Alchemist driver has only 4% gain (almost negligible) from MT. Is there something wrong? Especially considering the i7-13700K has better MT ratio than the i5-12400.

Susie1818 commented 6 months ago

@EstebanIntel @Arturo-Intel @Karen-Intel

For your reference

According to lots of positive reviews now seen on the internet, Horizon Forbidden West seems to be well optimized on most platforms. However, the performance is quite sub-par with intel Arc GPU. According to TechPowerUp's report, while they have used the "Game On Driver" (v5379) for this game, even an A770 loses to an RTX3060, let alone an A750.

The biggest problem is driver efficiency. A phenomenon that the framerate doesn't scale up with lowering resolution and graphics settings indicated that the Arc's driver struggles when the game engine starts to make some more use of CPU. The driver starts to compete with the application's code and chokes the whole system performance.

In fact, TechPowerUp has already used an i9-14900K for the test. That means the performance would be a total disaster if it is more realistically benchmarked on a mid-range system.

Dawidusergit commented 5 months ago

@EstebanIntel @Arturo-Intel @Karen-Intel @Susie1818 It's my result in "API overhead feature test". Tested on Ryzen 5 5600(+200MHz PBO), 16GB RAM 3800MHz, A770 16GB without oc, 5382 driver, win 11. Mid budget config with Arc graphic. Ss intel arc

Dawidusergit commented 5 months ago

@Susie1818 Score on 5444 driver, +10,5% improvement in DX 11 single-thread. 20-04-2024dr5444

Nidzhun commented 5 months ago

Just FYI what can do 12700k with rtx 3070 in this test. Rtx 3070 not so superior in performance compare with a770. Снимок экрана 2024-05-03 184938

Dawidusergit commented 5 months ago

My config but now with Nvidia RTX 3070TI GPU ~ 6x bigger DX11 MT score compared to A770 16GB. DX11APItest

Susie1818 commented 5 months ago

Just FYI what can do 12700k with rtx 3070 in this test. Rtx 3070 not so superior in performance compare with a770.

Nevertheless, the driver efficiency of Nvidia Ampere architecture is far superior to ARC Alchemist. A770's scores in this API overhead test look pathetic compared with the scores you posted with RTX 3070.

Susie1818 commented 3 months ago

Compared with driver v5081, the driver efficiency of v5590 has noticeably dropped, except DX12: DX11MT -9% DX11ST -12% DX12 +21% Vulkan -5% Screenshot (26)

FreekHub commented 3 months ago

Glad I found this thread, it is EXACTLY the problem I'm facing when moving from RTX 2060 Super to Intel ARC A770 in this game. Using a B550m motherboard with Ryzen 5 5600 stock settings, 16GB 3200mhz RAM and pcie gen4 nvme ssd. I get CPU usage spikes, when they go above 85%-90%, the framerate drops to around ~45 fps, but when using the 2060 super I'm mostly gpu bound at a steady 60fps with cpu usuage around 40-50% I still have the NVIDIA card and am able to provide additional data specific for this game.

Susie1818 commented 2 months ago

I recently retested HZD with driver version 5762, and the performance is consistently 5% lower than what I got with driver v5081. Sort of disappointing.

freak2fast4u commented 2 months ago

How are the 1% lows though?

On Sat, 3 Aug 2024, 17:42 Susie1818, @.***> wrote:

I recently retested HZD with driver version 5762, and the performance is consistently 5% lower than what I got with driver v5081. Sort of disappointing.

— Reply to this email directly, view it on GitHub https://github.com/IGCIT/Intel-GPU-Community-Issue-Tracker-IGCIT/issues/585#issuecomment-2266840688, or unsubscribe https://github.com/notifications/unsubscribe-auth/AA2TFIJBVA3RJIQ6C6KAYKDZPT26BAVCNFSM6AAAAAA7CGCQJKVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDENRWHA2DANRYHA . You are receiving this because you were mentioned.Message ID: <IGCIT/Intel-GPU-Community-Issue-Tracker-IGCIT/issues/585/2266840688@ github.com>

Susie1818 commented 2 months ago

Sorry, I didn't note down the 1% low figures when I did the tests with v5081 in February, so I don't really have the baseline for comparison.

Vivek-Intel commented 2 months ago

Thank you all for contributing to this issue by sharing your observations and test results. We are making continuous efforts to fix driver issues and improve performance of most popular games, apps to focus on providing a high quality, stable experience for the broadest set of users. While we can not accommodate the request to fix this issue as of now, please watch this article on our website for any possible changes in the status of this issue.

Vivek-Intel commented 2 months ago

We will notify if there are any updates regarding this issue , closing this thread for now.