GpuZelenograd / memtest_vulkan

Vulkan compute tool for testing video memory stability
https://github.com/GpuZelenograd/memtest_vulkan/blob/main/Readme.md
zlib License
262 stars 12 forks source link

Question about results #18

Closed markusew closed 8 months ago

markusew commented 8 months ago

Hi, I'm sorry if this is the wrong place to ask but I am desperate for solutions. I have recently gotten issues with games crashing/driver time out. I came across your tool and I got the following results:

Tester console logging started at 2023-10-16T23:54:13.091596Z

1: Bus=0x28:00 DevId=0x73DF   12GB AMD Radeon RX 6700 XT
Tester worker logging started at 2023-10-16T23:54:13.133822Z
Standard 5-minute test of 1: Bus=0x28:00 DevId=0x73DF   12GB AMD Radeon RX 6700 XT
      1 iteration. Passed  0.0689 seconds  written:    7.0GB 279.9GB/sec        checked:   10.5GB 239.5GB/sec
     17 iteration. Passed  1.0451 seconds  written:  112.0GB 297.6GB/sec        checked:  168.0GB 251.2GB/sec
     94 iteration. Passed  5.0549 seconds  written:  539.0GB 296.4GB/sec        checked:  808.5GB 249.8GB/sec
    548 iteration. Passed 30.0475 seconds  written: 3178.0GB 293.3GB/sec        checked: 4767.0GB 248.1GB/sec
   1001 iteration. Passed 30.0407 seconds  written: 3171.0GB 292.9GB/sec        checked: 4756.5GB 247.6GB/sec
   1454 iteration. Passed 30.0239 seconds  written: 3171.0GB 293.1GB/sec        checked: 4756.5GB 247.7GB/sec
   1906 iteration. Passed 30.0015 seconds  written: 3164.0GB 292.3GB/sec        checked: 4746.0GB 247.5GB/sec
   2357 iteration. Passed 30.0115 seconds  written: 3157.0GB 291.4GB/sec        checked: 4735.5GB 246.9GB/sec
   2812 iteration. Passed 30.0526 seconds  written: 3185.0GB 294.1GB/sec        checked: 4777.5GB 248.5GB/sec
   3264 iteration. Passed 30.0280 seconds  written: 3164.0GB 291.9GB/sec        checked: 4746.0GB 247.3GB/sec
   3719 iteration. Passed 30.0082 seconds  written: 3185.0GB 294.4GB/sec        checked: 4777.5GB 249.0GB/sec
   4172 iteration. Passed 30.0513 seconds  written: 3171.0GB 292.4GB/sec        checked: 4756.5GB 247.6GB/sec
Error found. Mode INITIAL_READ, total errors 0x14E out of 0x38000000 (0.00003555%)
Errors address range: 0xE042FECC..=0x18FF2C2CF  iteration:4173
values range: 0xFEA05162..=0x0055D773   FFFFFFFF-like count:0    bit-level stats table:
         0x0 0x1  0x2 0x3| 0x4 0x5  0x6 0x7| 0x8 0x9  0xA 0xB| 0xC 0xD  0xE 0xF
SinglIdx                 |              330|                 |                 
   0x1?    2             |                 |                 |                 
TogglCnt     332    2    |                 |                 |                 
1sInValu                 |            1   3|   3  23   35  58|  81  49   45  22
   0x1?   10   3    1    |                 |                 |                 

Is it likely that these VRAM errors cause games crashing? And if so, is there a way to fix this or do I have to replace the VRAM module?

galkinvv commented 8 months ago

Is it likely that these VRAM errors cause games crashing?

In general, yes: if some memtest_vulkan iterations find more the 1-2 erros (0x14E errors for your case) - the same problematic GPU behaviour can lead to crashes in games. However, while these errors looks like "unexpected data returned from memory for memtest_vulkan" - it is NOT a 100% proof that error cause is VRAM itself but not the GPU core.

Readme contains some theory about finding reasons, but even for experienced person it is hard to be sure in exact reason.

And if so, is there a way to fix this or do I have to replace the VRAM module?

You can try not reliable but easy&safe way by lowering Memory Clock, GPU Clock and Power limit via MSI Afterburner utility. Start with the lowest cloks and check if they are stable for your use cases. If yes - start finding more performant stable variant

Sometimes such clock lowering helps, sometimes after several months the problem reappears again.

If this doesn't help - only some physical repair like soldering new VRAM IC can help. Repair questions related VRAM IC soldering to can be discussed on a GPURepair subreddit. Please read two pinned topics before poting there. The "List of GPU repair resources" gives a breif overview of available repair notuces and "How to request advice" helps keeping information more organized

galkinvv commented 8 months ago

@markusew I'm moving this from issues to discussions, its better suited there