GpuZelenograd / memtest_vulkan

Vulkan compute tool for testing video memory stability
https://github.com/GpuZelenograd/memtest_vulkan/blob/main/Readme.md
zlib License
306 stars 14 forks source link

Drastically different bandwidth to the expected with default memory usage (pair of 4k HDR displays) #38

Open Vasllo opened 7 months ago

Vasllo commented 7 months ago

So, I noticed you used a RTX 2070 to take those screenshots, which uses a memory subsystem with identical specs to my RTX 3060 Ti, but I'm getting 32 to 43GB/s, while the screenshots show 317 to 327GB/s and noticed on HWInfo that I had 100% core usage, even though it's a more powerful GPU than the 2070.

galkinvv commented 7 months ago

The speeds for 3060Ti should be a bit bigger then for 2070

I have a TUF 3060Ti in my lab, it achieves 380-390GB/sec memtest-vulkan-3060ti

There maybe some fluctuations between different vendors, like +-10-20%. If you have a GPU giving ~10x smaller speed - it is really strange. Maybe there is some background app active? Whats the GPU usage & temperatures without memtest_vualkn running?

If memtest_vulkan is only app - whats your driver version? Having quite dated driver, like 1-2 years is OK, but some really old drivers like "the first relaease supporting 3060ti from 3.5years ago" can have some problems with vulkan API.

Vasllo commented 7 months ago

The speeds for 3060Ti should be a bit bigger then for 2070

I have a TUF 3060Ti in my lab, it achieves 380-390GB/sec memtest-vulkan-3060ti

There maybe some fluctuations between different vendors, like +-10-20%. If you have a GPU giving ~10x smaller speed - it is really strange. Maybe there is some background app active? Whats the GPU usage & temperatures without memtest_vualkn running?

If memtest_vulkan is only app - whats your driver version? Having quite dated driver, like 1-2 years is OK, but some really old drivers like "the first relaease supporting 3060ti from 3.5years ago" can have some problems with vulkan API.

I'm a bit lost now, because I'm getting different and weird results each run. I also noticed our GPUs use different dies and have different launch dates. Mine is a Palit Dual model. The first run today, with everything closed, got about double the performance I had yesterday, and another run now got even higher peak results, but still a lot lower than yours, specially the checking speed, which I guess is read-intensive. For some reason, mine keeps ramping up the write speed to a decent level, but not the read speed. And it also dropped after some time. All I have open is Firefox, as well as some background services, but other than that, I closed everything user-loaded.

Stock and idle after the first run of the day: image

Stock and running memtest_Vulkan: image

galkinvv commented 7 months ago

Looks strange & interesting. The GPU is indeed quite rare GA103-based 3060Ti, but its architecture is very similar to GA104. It is quite rare, so I never had a chance to run memtest_vulkan on it, but lots of other NVIDIA and AMD gave much more predictable results.

The check is read intensive but allso a bit more computational intensive:

So it seems that somehow the testing is capped by GPU utilization. This is also confirmed by the GPU utilization being at 100% while testing and memory utilization only 27%.

What caused it - I really don't know. The GPU clock speed seems to be fine. It seems that either the nvidia driver performs some strange shader sheduling on GA103 or that some other app inteferes.

I have several ideas to check:

  1. Run memtest_vulkan with reduced memory allocation. For this create a 4gb.bat in the same folder containing

    memtest_vulkan-v0.5.0.exe 1 4000000000 pause Maybe there is a memory usage conflict with other apps. memtest_vulkan reserves ~1GB for otehr apps but maybe it is not enoigh on your system

  2. Check if performance of other tools like Furmark is near expected for 3060Ti.

  3. Use GpuShark to list apps using GPU during test and try to reduce their count even further. Maybe Firefox or some other app somehow affects it. Or check the behaviour right after rebooting.

  4. If you by chance have dual boot in another OS installation - you can try there. Or try another 8GB GPU on this PC or this GPU in another PC. I'm really out of ideas.

Vasllo commented 7 months ago

Looks strange & interesting. The GPU is indeed quite rare GA103-based 3060Ti, but its architecture is very similar to GA104. It is quite rare, so I never had a chance to run memtest_vulkan on it, but lots of other NVIDIA and AMD gave much more predictable results.

The check is read intensive but allso a bit more computational intensive:

* calcaulation related to comparison

* possible error reporting - due to shader nature the error reporting is somehow put in the compute pipeline so it can be executed in case of errr).

So it seems that somehow the testing is capped by GPU utilization. This is also confirmed by the GPU utilization being at 100% while testing and memory utilization only 27%.

What caused it - I really don't know. The GPU clock speed seems to be fine. It seems that either the nvidia driver performs some strange shader sheduling on GA103 or that some other app inteferes.

I have several ideas to check:

1. Run memtest_vulkan with reduced memory allocation. For this create a 4gb.bat in the same folder containing
   `memtest_vulkan-v0.5.0.exe 1 4000000000`
   `pause`
   Maybe there is a memory usage conflict with other apps. memtest_vulkan reserves ~1GB for otehr apps but maybe it is not enoigh on your system

2. Check if performance of other tools like Furmark is near expected for 3060Ti.

3. Use GpuShark to list apps using GPU during test and try to reduce their count even further. Maybe Firefox or some other app somehow affects it. Or check the behaviour right after rebooting.

4. If you by chance have dual boot in another OS installation - you can try there. Or try another 8GB GPU on this PC or this GPU in another PC. I'm really out of ideas.

You hit the nail on the head, it was trying to allocate more memory than was available. After running the bat you suggested, I got this: image

I tried closing everything not system-related, but there wasn't much left. After running the bat, I tried the normal .exe again and had the same terrible results. Then I noticed, as you said, that it reserves 1GB of VRAM, but my VRAM never went under 1.5GB usage, even with everything closed. I have dual 4k 144Hz HDR monitors, so I lowered the resolution to 720p and disabled HDR (which fred about 400MB by itself) and then I could run the default memtest_Vulkan again.

It was very useful to test my overclock, because I got ~1890MHz core and 8400MHz (or 2100MHz) stable on 3DMark Time Spy Extreme and Cyberpunk 2077 benchmark, but it would cause a ton of errors in this test. +1300 on the memory seemed fine, but got errors after ~30min, which I guess would be fine for most games, but I decided to go safe with +1200 and has zero errors after >30min. Got write at ~421GB/s and read at ~448GB/s.

Weirdly, the write is still ~19% lower than yours at stock, and read about ~7% lower.

But thanks a lot for the help :)

galkinvv commented 7 months ago

Great news, the culprit if found. Regarding the 7-19% lowered speed - it maybe related to monitor refresh rate too - I noticed that just outputting picture consumes quite many resources regarding memoory bandwith. My test was with non-HDR 2560x1440@60

About initial problem - it seems that I have to reserve some more memory to be compatible with such monitors setup by default, at least on 8+GB GPUs. For this I need to know how much to reserve.

Can you test another similar bats in yours some semi-default environment (all your typical background tools active, HDR monitors active but all apps except brorwser closed - like you ran in the first time)? This is equivalent to rserving a bit more memory (actually the test rounds the size to a lower number multiple of 256MB)

memtest_vulkan-v0.5.0.exe 1 6600000000
pause

and another one with 6900000000.

dakkidaze commented 7 months ago

I got a similar problem on my Gigabyte 4080 Super Windforce too. I have a 3440*1440 monitor, no HDR, 75Hz. Default without any parameters: 6b96dcd82ee31635e66a0d7cefb988e1 memtest 1 4000000000 5fd8a561f6f13b9ce93fc2d5f66fee7f memtest 1 6600000000 图片

And finally I tried plugging my monitor onto the integrated graphics, and problem goes away immediately. 图片

galkinvv commented 7 months ago

Thanks for reporting Currently memtest_vulkan gets free memory size via vulkan API, but it seems that it allows some VRAM overallocating with swapping to main RAM. @dakkidaze, сan you try some more memory size variants with monitor plugged to your 4080S? I'm trying to estimate how much memory need to be reserved to get normal speed, but the 66.... test you performed was applicable to 8GB GPUs; the 16-GB GPUs has other numbers to test.

  1. At first try memtest-vulkan-v0.5.0.exe 1 14600000000 2.1 If it will show speed over 600GB/sec - try a greater value: memtest-vulkan-v0.5.0.exe 1 14900000000 2.2 Or if the previous test will show slow speed - try a smaller value: memtest-vulkan-v0.5.0.exe 1 14300000000

Thanks in advance!

dakkidaze commented 7 months ago

图片 both 14.6GB and 14.9GB yields normal(~10% lower than theoretical figures, I since applied a VRAM OC so it's over 700GB/s, just ignore specific numbers) memory speeds.