GpuZelenograd / memtest_vulkan

Vulkan compute tool for testing video memory stability
https://github.com/GpuZelenograd/memtest_vulkan/blob/main/Readme.md
zlib License
262 stars 12 forks source link

How do you change memory block size? #7

Open BA8F0D39 opened 1 year ago

BA8F0D39 commented 1 year ago

I want to change it from 1.8GB to 4GB to 8GB to 16GB ...etc to see if it corrupts memory

galkinvv commented 1 year ago

What GPU and OS do you have? Please show how memtest_vulkan detects your GPU (several first line of output)

Note that memtest_vulkan tests video memory which is different from system memory for all GPUs except integrated. It already tries to allocate as much GPU memory as possible, but due to several reaasons it allocates consecutive memory, not several isolated regions.

galkinvv commented 1 year ago

Also, the behavior of the maximum available memory may be driver dependent. For example, for the AMD rx580 8GB GPU on Linux it is possible to install two vulkan loadable drivers simultaneously on one system - one RADV from Mesa project, and the other AMDVLK driver. Then the driver can be explicitly selected by specifying VK_DRIVER_FILES environment variable

with Khronos vulkan loader libvulkan.so version below v1.3.207 (use VK_ICD_FILENAMES instead of VK_DRIVER_FILES. And the

[user@host ~]$ VK_DRIVER_FILES=/usr/share/vulkan/icd.d/radeon_icd.x86_64.json ./memtest_vulkan
https://github.com/GpuZelenograd/memtest_vulkan v0.5.0 by GpuZelenograd
To finish testing use Ctrl+C

1: Bus=0x01:00 DevId=0x67DF   8GB AMD Radeon RX 580 Series (RADV POLARIS10)
Standard 5-minute test of 1: Bus=0x01:00 DevId=0x67DF   8GB AMD Radeon RX 580 Series (RADV POLARIS10)
      1 iteration. Passed  0.0324 seconds  written:    1.8GB 179.3GB/sec        checked:    3.5GB 154.5GB/sec
     32 iteration. Passed  1.0063 seconds  written:   54.2GB 178.5GB/sec        checked:  108.5GB 154.4GB/sec
^C
memtest_vulkan: no any errors, testing PASSed.
  press any key to continue...
[user@host ~]$ VK_DRIVER_FILES=/usr/share/vulkan/icd.d/amd_icd64.json ./memtest_vulkan
https://github.com/GpuZelenograd/memtest_vulkan v0.5.0 by GpuZelenograd
To finish testing use Ctrl+C

1: Bus=0x01:00 DevId=0x67DF   8GB AMD Radeon RX 580 Series
Standard 5-minute test of 1: Bus=0x01:00 DevId=0x67DF   8GB AMD Radeon RX 580 Series
      1 iteration. Passed  0.0637 seconds  written:    3.5GB 181.7GB/sec        checked:    7.0GB 157.6GB/sec
     17 iteration. Passed  1.0189 seconds  written:   56.0GB 181.6GB/sec        checked:  112.0GB 157.6GB/sec
^C
memtest_vulkan: no any errors, testing PASSed.
  press any key to continue...

So with mesa RADV driver only 3.5GB is allocated and for AMDVLK driver 7.0GB is allocated. It seems that RADV has a 32-bit address space limit. This example shows that the behavior is very driver-dependent. (0.5-1.0 GB memory is automatically ignored by memtest_vulkan to avoid system freeze, since allocating all video memory can freeze the system).

BA8F0D39 commented 1 year ago

@galkinvv I have Intel Arc A770 on Linux 6.3 with Mesa 22. It only allocates blocks of 0.9 GB. I want to allocate more than 4 GB to see if they corrupt. Is there any argument I can feed into the program to change the block size.

https://github.com/GpuZelenograd/memtest_vulkan v0.5.0 by GpuZelenograd
To finish testing use Ctrl+C

1: Bus=0x03:00 DevId=0x56A0   16GB Intel(R) Arc(tm) A770 Graphics (DG2)
2: Bus=0x00:00 DevId=0x0000   2GB llvmpipe (LLVM 13.0.1, 256 bits)
(first device will be autoselected in 0 seconds)   Override index to test:
    ...first device autoselected
Standard 5-minute test of 1: Bus=0x03:00 DevId=0x56A0   16GB Intel(R) Arc(tm) A770 Graphics (DG2)
      1 iteration. Passed  0.0082 seconds  written:    0.9GB 381.1GB/sec        checked:    1.8GB 295.5GB/sec
    130 iteration. Passed  1.0073 seconds  written:  112.9GB 402.5GB/sec        checked:  225.8GB 310.6GB/sec
    743 iteration. Passed  5.0001 seconds  written:  536.4GB 381.4GB/sec        checked: 1072.8GB 298.5GB/sec
   4467 iteration. Passed 30.0023 seconds  written: 3258.5GB 387.1GB/sec        checked: 6517.0GB 301.9GB/sec
   8250 iteration. Passed 30.0055 seconds  written: 3310.1GB 394.2GB/sec        checked: 6620.2GB 306.4GB/sec
  12004 iteration. Passed 30.0045 seconds  written: 3284.8GB 390.7GB/sec        checked: 6569.5GB 304.2GB/sec
  15792 iteration. Passed 30.0063 seconds  written: 3314.5GB 395.3GB/sec        checked: 6629.0GB 306.6GB/sec
  19567 iteration. Passed 30.0048 seconds  written: 3303.1GB 393.8GB/sec        checked: 6606.2GB 305.6GB/sec
  23310 iteration. Passed 30.0040 seconds  written: 3275.1GB 389.9GB/sec        checked: 6550.2GB 303.2GB/sec
  27100 iteration. Passed 30.0074 seconds  written: 3316.2GB 396.1GB/sec        checked: 6632.5GB 306.6GB/sec
  30895 iteration. Passed 30.0069 seconds  written: 3320.6GB 396.4GB/sec        checked: 6641.2GB 307.0GB/sec
  34673 iteration. Passed 30.0073 seconds  written: 3305.8GB 393.9GB/sec        checked: 6611.5GB 305.9GB/sec
galkinvv commented 1 year ago

EDITED - just found that Mesa 23.1.0~rc1 removes this limitation, see https://github.com/GpuZelenograd/memtest_vulkan/issues/7#issuecomment-1512901647 --- older explmanation below The memory size to test can be directly specified in command line - the first argumet is GPU index to test, the next is VRAM size to test in bytes.

However as far as I mentioned I suppose that it is a limitation of intel vulkan driver - it can't allocate contigous block greater than 2GB (example from integrated GPU):

user@host ~/m % ./memtest_vulkan 1 2140000000 
Standard 5-minute test of 1: Bus=0x00:02 DevId=0x9A49   12GB Intel(R) Xe Graphics (TGL GT2)
      1 iteration. Passed  0.0943 seconds  written:    0.9GB  22.8GB/sec        checked:    1.8GB  31.3GB/sec
     12 iteration. Passed  1.0762 seconds  written:    9.6GB  21.4GB/sec        checked:   19.2GB  30.7GB/sec
^C%                                                                                                                                                                                                                
user@host ~/m % ./memtest_vulkan 1 2150000000
VK_ERROR_OUT_OF_DEVICE_MEMORY
user@host ~/m % 

The limitation seems to be a limitation for contigous memory region, since two instances of memtest_vulkan testing 1.8GB each can be started just fine. Unfortunately the use of a single contigous buffer is a part of current design of memtest_vulkan, so it can't be easily changed.

A more detailed out-of-memory error output can be got if the binary is named as memtest_vulkan_verbose:
user@host ~/m % cp ./memtest_vulkan ./memtest_vulkan_verbose 
user@host ~/m % ./memtest_vulkan_verbose 1 2150000000       
Verbose feature enabled (or 'verbose' found in name). Vulkan instance 1.3.239
WARNING:          vkEnumerateInstanceLayerProperties: Unable to resolve symbol "" in implicit layer library "libVkLayer_MESA_device_select.so"
WARNING:          vkEnumerateInstanceLayerProperties: Unable to resolve symbol "" in implicit layer library "libVkLayer_MESA_device_select.so"
Available: 
VK_LAYER_MESA_device_select, VK_LAYER_KHRONOS_validation, VK_LAYER_MESA_overlay, VK_LAYER_INTEL_nullhw
Extensions: VK_KHR_device_group_creation, VK_KHR_display, VK_KHR_external_fence_capabilities, VK_KHR_external_memory_capabilities, VK_KHR_external_semaphore_capabilities, VK_KHR_get_display_properties2, VK_KHR_get_physical_device_properties2, VK_KHR_get_surface_capabilities2, VK_KHR_surface, VK_KHR_surface_protected_capabilities, VK_KHR_wayland_surface, VK_KHR_xcb_surface, VK_KHR_xlib_surface, VK_EXT_acquire_drm_display, VK_EXT_acquire_xlib_display, VK_EXT_debug_report, VK_EXT_debug_utils, VK_EXT_direct_mode_display, VK_EXT_display_surface_counter, VK_EXT_swapchain_colorspace, VK_KHR_portability_enumeration

linux_read_sorted_physical_devices:
     Original order:
           [0] llvmpipe (LLVM 15.0.7, 256 bits)
           [1] Intel(R) Xe Graphics (TGL GT2)
     Sorted order:
           [0] Intel(R) Xe Graphics (TGL GT2)  
           [1] llvmpipe (LLVM 15.0.7, 256 bits)  
linux_read_sorted_physical_devices:
     Original order:
           [0] llvmpipe (LLVM 15.0.7, 256 bits)
           [1] Intel(R) Xe Graphics (TGL GT2)
     Sorted order:
           [0] Intel(R) Xe Graphics (TGL GT2)  
           [1] llvmpipe (LLVM 15.0.7, 256 bits)  
linux_read_sorted_physical_devices:
     Original order:
           [0] llvmpipe (LLVM 15.0.7, 256 bits)
           [1] Intel(R) Xe Graphics (TGL GT2)
     Sorted order:
           [0] Intel(R) Xe Graphics (TGL GT2)  
           [1] llvmpipe (LLVM 15.0.7, 256 bits)  
linux_read_sorted_physical_devices:
     Original order:
           [0] llvmpipe (LLVM 15.0.7, 256 bits)
           [1] Intel(R) Xe Graphics (TGL GT2)
     Sorted order:
           [0] Intel(R) Xe Graphics (TGL GT2)
           [1] llvmpipe (LLVM 15.0.7, 256 bits)  
Loading memory info for selected device index 0...
Inserted device layer "VK_LAYER_KHRONOS_validation" (libVkLayer_khronos_validation.so)
Failed to find vkGetDeviceProcAddr in layer "libVkLayer_MESA_device_select.so"
       Using "Intel(R) Xe Graphics (TGL GT2)" with driver: "/usr/lib/x86_64-linux-gnu/libvulkan_intel.so"

Validation Information: [ UNASSIGNED-cache-file-error ] Object 0: handle = 0x563a42a8d090, type = VK_OBJECT_TYPE_DEVICE; | MessageID = 0xf0bb3995 | Cannot open shader validation cache at /home/user/.cache/shader_validation_cache-1000.bin for reading (it may not exist yet)
 0 MemoryType { property_flags: DEVICE_LOCAL | HOST_VISIBLE | HOST_COHERENT | HOST_CACHED, heap_index: 0 } 
CoherentIO memory          type 0 inside heap MemoryHeap { size: 12388952064, flags: DEVICE_LOCAL }
Trying   2.002GB buffer...
VK_ERROR_OUT_OF_DEVICE_MEMORY
Retrying with lower memory due to ERROR_OUT_OF_DEVICE_MEMORY while getting erupt::generated::vk1_0::DeviceMemory in context allocate_memory

I suppose that for descrete Intel GPU the message is the same -


VK_ERROR_OUT_OF_DEVICE_MEMORY
Retrying with lower memory due to ERROR_OUT_OF_DEVICE_MEMORY while getting erupt::generated::vk1_0::DeviceMemory in context allocate_memory

However, if you have different error - please post it here
galkinvv commented 1 year ago

This is advertised in maxMemoryAllocationSize output of vulkaninfo:

 % vulkaninfo | grep -C 4 VkPhysicalDeviceVulkan12Properties
    protectedNoFault                  = false
    maxPerSetDescriptors              = 1024
    maxMemoryAllocationSize           = 0x80000000

VkPhysicalDeviceVulkan12Properties:
-----------------------------------
    driverID                                             = DRIVER_ID_INTEL_OPEN_SOURCE_MESA
    driverName                                           = Intel open-source Mesa driver
    driverInfo                                           = Mesa 23.0.2
--
    protectedNoFault                  = false
    maxPerSetDescriptors              = 1024
    maxMemoryAllocationSize           = 0x80000000

VkPhysicalDeviceVulkan12Properties:
-----------------------------------
    driverID                                             = DRIVER_ID_MESA_LLVMPIPE
    driverName                                           = llvmpipe
    driverInfo                                           = Mesa 23.0.2 (LLVM 15.0.7)
galkinvv commented 1 year ago

UPDATE

Mesa 23.1.0~rc1 removes this limitation! https://cgit.freedesktop.org/mesa/mesa/commit/?h=mesa-23.1.0-rc1&id=71fe9dfe07b53115a9e8f8b031a04f6f387937d6

I Installed it from this PPA https://launchpad.net/~ernstp/+archive/ubuntu/mesarc and after updating memtest_vulkan become able to allocate 10.5GB for integrated GPU.

Example run (in verbose mode)
 % ./memtest_vulkan_verbose
https://github.com/GpuZelenograd/memtest_vulkan v0.5.0 by GpuZelenograd
To finish testing use Ctrl+C
Verbose feature enabled (or 'verbose' found in name). Vulkan instance 1.3.239
WARNING:          vkEnumerateInstanceLayerProperties: Unable to resolve symbol "" in implicit layer library "libVkLayer_MESA_device_select.so"
WARNING:          vkEnumerateInstanceLayerProperties: Unable to resolve symbol "" in implicit layer library "libVkLayer_MESA_device_select.so"
Available: 
VK_LAYER_MESA_device_select, VK_LAYER_KHRONOS_validation, VK_LAYER_MESA_overlay, VK_LAYER_INTEL_nullhw
Extensions: VK_KHR_device_group_creation, VK_KHR_display, VK_KHR_external_fence_capabilities, VK_KHR_external_memory_capabilities, VK_KHR_external_semaphore_capabilities, VK_KHR_get_display_properties2, VK_KHR_get_physical_device_properties2, VK_KHR_get_surface_capabilities2, VK_KHR_surface, VK_KHR_surface_protected_capabilities, VK_KHR_wayland_surface, VK_KHR_xcb_surface, VK_KHR_xlib_surface, VK_EXT_acquire_drm_display, VK_EXT_acquire_xlib_display, VK_EXT_debug_report, VK_EXT_debug_utils, VK_EXT_direct_mode_display, VK_EXT_display_surface_counter, VK_EXT_swapchain_colorspace, VK_EXT_surface_maintenance1, VK_KHR_portability_enumeration

linux_read_sorted_physical_devices:
     Original order:
           [0] llvmpipe (LLVM 15.0.7, 256 bits)
           [1] Intel(R) Xe Graphics (TGL GT2)
     Sorted order:
           [0] Intel(R) Xe Graphics (TGL GT2)  
           [1] llvmpipe (LLVM 15.0.7, 256 bits)  
linux_read_sorted_physical_devices:
     Original order:
           [0] llvmpipe (LLVM 15.0.7, 256 bits)
           [1] Intel(R) Xe Graphics (TGL GT2)
     Sorted order:
           [0] Intel(R) Xe Graphics (TGL GT2)  
           [1] llvmpipe (LLVM 15.0.7, 256 bits)  
linux_read_sorted_physical_devices:
     Original order:
           [0] llvmpipe (LLVM 15.0.7, 256 bits)
           [1] Intel(R) Xe Graphics (TGL GT2)
     Sorted order:
           [0] Intel(R) Xe Graphics (TGL GT2)  
           [1] llvmpipe (LLVM 15.0.7, 256 bits)  
linux_read_sorted_physical_devices:
     Original order:
           [0] llvmpipe (LLVM 15.0.7, 256 bits)
           [1] Intel(R) Xe Graphics (TGL GT2)
     Sorted order:
           [0] Intel(R) Xe Graphics (TGL GT2)  
           [1] llvmpipe (LLVM 15.0.7, 256 bits)  

1: Bus=0x00:02 DevId=0x9A49 API 1.3.246  v23(0x5C01000)  12GB Intel(R) Xe Graphics (TGL GT2)
2: Bus=0x00:00 DevId=0x0000 API 1.3.246  ver1  2GB llvmpipe (LLVM 15.0.7, 256 bits)
(first device will be autoselected in 8 seconds)   Override index to test:
    ...testing default device confirmed
Loading memory info for selected device index 0...
heap size 11.5GB budget 11.2GB usage  0.0GB flags=DEVICE_LOCAL
Spawned child Child { stdin: None, stdout: None, stderr: None, .. } with PID 76521
Verbose feature enabled (or 'verbose' found in name). Vulkan instance 1.3.239
WARNING:          vkEnumerateInstanceLayerProperties: Unable to resolve symbol "" in implicit layer library "libVkLayer_MESA_device_select.so"
WARNING:          vkEnumerateInstanceLayerProperties: Unable to resolve symbol "" in implicit layer library "libVkLayer_MESA_device_select.so"
Available: 
VK_LAYER_MESA_device_select, VK_LAYER_KHRONOS_validation, VK_LAYER_MESA_overlay, VK_LAYER_INTEL_nullhw
Extensions: VK_KHR_device_group_creation, VK_KHR_display, VK_KHR_external_fence_capabilities, VK_KHR_external_memory_capabilities, VK_KHR_external_semaphore_capabilities, VK_KHR_get_display_properties2, VK_KHR_get_physical_device_properties2, VK_KHR_get_surface_capabilities2, VK_KHR_surface, VK_KHR_surface_protected_capabilities, VK_KHR_wayland_surface, VK_KHR_xcb_surface, VK_KHR_xlib_surface, VK_EXT_acquire_drm_display, VK_EXT_acquire_xlib_display, VK_EXT_debug_report, VK_EXT_debug_utils, VK_EXT_direct_mode_display, VK_EXT_display_surface_counter, VK_EXT_swapchain_colorspace, VK_EXT_surface_maintenance1, VK_KHR_portability_enumeration

linux_read_sorted_physical_devices:
     Original order:
           [0] llvmpipe (LLVM 15.0.7, 256 bits)
           [1] Intel(R) Xe Graphics (TGL GT2)
     Sorted order:
           [0] Intel(R) Xe Graphics (TGL GT2)  
           [1] llvmpipe (LLVM 15.0.7, 256 bits)  
linux_read_sorted_physical_devices:
     Original order:
           [0] llvmpipe (LLVM 15.0.7, 256 bits)
           [1] Intel(R) Xe Graphics (TGL GT2)
     Sorted order:
           [0] Intel(R) Xe Graphics (TGL GT2)  
           [1] llvmpipe (LLVM 15.0.7, 256 bits)  
linux_read_sorted_physical_devices:
     Original order:
           [0] llvmpipe (LLVM 15.0.7, 256 bits)
           [1] Intel(R) Xe Graphics (TGL GT2)
     Sorted order:
           [0] Intel(R) Xe Graphics (TGL GT2)  
           [1] llvmpipe (LLVM 15.0.7, 256 bits)  
linux_read_sorted_physical_devices:
     Original order:
           [0] llvmpipe (LLVM 15.0.7, 256 bits)
           [1] Intel(R) Xe Graphics (TGL GT2)
     Sorted order:
           [0] Intel(R) Xe Graphics (TGL GT2)  
           [1] llvmpipe (LLVM 15.0.7, 256 bits)  
Loading memory info for selected device index 0...
Inserted device layer "VK_LAYER_KHRONOS_validation" (libVkLayer_khronos_validation.so)
Failed to find vkGetDeviceProcAddr in layer "libVkLayer_MESA_device_select.so"
       Using "Intel(R) Xe Graphics (TGL GT2)" with driver: "/usr/lib/x86_64-linux-gnu/libvulkan_intel.so"

Validation Information: [ UNASSIGNED-cache-file-error ] Object 0: handle = 0x564adee5b970, type = VK_OBJECT_TYPE_DEVICE; | MessageID = 0xf0bb3995 | Cannot open shader validation cache at /home/vasiliy.galkin/.cache/shader_validation_cache-1000.bin for reading (it may not exist yet)
 0 MemoryType { property_flags: DEVICE_LOCAL, heap_index: 0 } 
 1 MemoryType { property_flags: DEVICE_LOCAL | HOST_VISIBLE | HOST_COHERENT, heap_index: 0 } 
 2 MemoryType { property_flags: DEVICE_LOCAL | HOST_VISIBLE | HOST_COHERENT | HOST_CACHED, heap_index: 0 } 
CoherentIO memory          type 1 inside heap MemoryHeap { size: 12388952064, flags: DEVICE_LOCAL }
Trying  10.813GB buffer...
Test memory size  10.8GB   type  0: MemoryType { property_flags: DEVICE_LOCAL, heap_index: 0 } MemoryHeap { size: 12388952064, flags: DEVICE_LOCAL }
Standard 5-minute test of 1: Bus=0x00:02 DevId=0x9A49 API 1.3.246  v23(0x5C01000)  12GB Intel(R) Xe Graphics (TGL GT2)
      1 iteration. Passed  0.6808 seconds  written:    7.0GB  20.0GB/sec        checked:   10.5GB  31.7GB/sec
      3 iteration. Passed  1.3355 seconds  written:   14.0GB  21.0GB/sec        checked:   21.0GB  31.4GB/sec
     11 iteration. Passed  5.3045 seconds  written:   56.0GB  21.2GB/sec        checked:   84.0GB  31.5GB/sec
^CSubprocess status exit status: 65 parent_close_requested true

memtest_vulkan: no any errors, testing PASSed.
  press any key to continue...
BA8F0D39 commented 1 year ago

Upgrading from Mesa 23.0 to Mesa 23.1 using the PPA broke vulkan on my system. I will wait for mesa 23.1 the official ubuntu repos

ERROR: loader_validate_layers: Layer 0 does not exist in the list of available layers
Not using validation layers due to ERROR_LAYER_NOT_PRESENT while getting erupt::generated::InstanceLoader in context instance with validation
Loading memory info for selected device index 0...
ERROR | DRIVER: terminator_CreateDevice: Failed in ICD /usr/lib/x86_64-linux-gnu/libvulkan_intel.so vkCreateDevice call
ERROR: vkCreateDevice:  Failed to create device chain.
Runtime error: a Vulkan function returned a negative `Result` value
Subprocess status exit status: 68 parent_close_requested false
retrying subprocess with smaller memory limit 665845760
Spawned child Child { stdin: None, stdout: None, stderr: None, .. } with PID 5461
Verbose feature enabled (or 'verbose' found in name). Vulkan instance 1.3.204
Available: 
VK_LAYER_MESA_device_select, VK_LAYER_MESA_overlay, VK_LAYER_INTEL_nullhw
Extensions: VK_KHR_device_group_creation, VK_KHR_external_fence_capabilities, VK_KHR_external_memory_capabilities, VK_KHR_external_semaphore_capabilities, VK_KHR_get_physical_device_properties2, VK_KHR_get_surface_capabilities2, VK_KHR_surface, VK_KHR_surface_protected_capabilities, VK_KHR_wayland_surface, VK_KHR_xcb_surface, VK_KHR_xlib_surface, VK_EXT_debug_report, VK_EXT_debug_utils, VK_KHR_display, VK_KHR_get_display_properties2, VK_EXT_acquire_drm_display, VK_EXT_acquire_xlib_display, VK_EXT_direct_mode_display, VK_EXT_display_surface_counter, VK_EXT_swapchain_colorspace

ERROR: loader_validate_layers: Layer 0 does not exist in the list of available layers
Not using validation layers due to ERROR_LAYER_NOT_PRESENT while getting erupt::generated::InstanceLoader in context instance with validation
Loading memory info for selected device index 0...
ERROR | DRIVER: terminator_CreateDevice: Failed in ICD /usr/lib/x86_64-linux-gnu/libvulkan_intel.so vkCreateDevice call
ERROR: vkCreateDevice:  Failed to create device chain.
Runtime error: a Vulkan function returned a negative `Result` value
Subprocess status exit status: 68 parent_close_requested false
Using in-process testing method with small memory limit 665845760
Using in-process testing method
ERROR | DRIVER: terminator_CreateDevice: Failed in ICD /usr/lib/x86_64-linux-gnu/libvulkan_intel.so vkCreateDevice call
ERROR: vkCreateDevice:  Failed to create device chain.
Runtime error: a Vulkan function returned a negative `Result` value

memtest_vulkan: INIT OR FIRST testing failed due to runtime error
  press any key to continue...
BA8F0D39 commented 1 year ago

@galkinvv Using Mesa 23.2.0. I can allocate more than 4GB but they are all corrupt.

./memtest_vulkan 1 9140000000 
Error found. Mode NEXT_RE_READ, total errors 0x20000000 out of 0x2C000000 (72.72727273%)
Errors address range: 0x30000000..=0xAFFFFFFF  iteration:1
values range: 0x00000000..=0x00000000   FFFFFFFF-like count:0    bit-level stats table:
         0x0 0x1  0x2 0x3| 0x4 0x5  0x6 0x7| 0x8 0x9  0xA 0xB| 0xC 0xD  0xE 0xF
SinglIdx                 |   1             |                 |       1         
TogglCnt       2   56 761|6673 42k 205k793k|  2m  6m  14m 27m| 45m 63m  76m 81m
   0x1?  74m 58m  40m 24m| 12m  5m   1m589k|145k 28k 4277 457|  31   1         
1sInValu536m             |                 |                 |                 

Error found. Mode INITIAL_READ, total errors 0x20000000 out of 0x2C000000 (72.72727273%)
Errors address range: 0xE0000000..=0x15FFFFFFF  iteration:1
values range: 0x00000000..=0x00000000   FFFFFFFF-like count:0    bit-level stats table:
         0x0 0x1  0x2 0x3| 0x4 0x5  0x6 0x7| 0x8 0x9  0xA 0xB| 0xC 0xD  0xE 0xF
SinglIdx                 |   1             |                 |       1         
TogglCnt       2   56 761|6673 42k 205k793k|  2m  6m  14m 27m| 45m 63m  76m 81m
   0x1?  74m 58m  40m 24m| 12m  5m   1m589k|145k 28k 4277 457|  31   1         
1sInValu536m             |                 |                 |                 

Error found. Mode INITIAL_READ, total errors 0x20000000 out of 0x2C000000 (72.72727273%)
Errors address range: 0x190000000..=0x20FFFFFFF  iteration:1
values range: 0x00000000..=0x00000000   FFFFFFFF-like count:0    bit-level stats table:
         0x0 0x1  0x2 0x3| 0x4 0x5  0x6 0x7| 0x8 0x9  0xA 0xB| 0xC 0xD  0xE 0xF
SinglIdx                 |   1             |                 |       1         
TogglCnt       2   56 761|6672 42k 205k793k|  2m  6m  14m 27m| 45m 63m  76m 81m
   0x1?  74m 58m  40m 24m| 12m  5m   1m589k|145k 28k 4277 457|  31   1         
1sInValu536m             |                 |                 |                 

Standard 5-minute test of 1: Bus=0x03:00 DevId=0x56A0   16GB Intel(R) Arc(tm) A770 Graphics (DG2)
      1 iteration. Passed  5.6310 seconds  written:    5.5GB 956.2GB/sec        checked:    8.2GB   1.5GB/sec
Error found. Mode NEXT_RE_READ, total errors 0x20000000 out of 0x2C000000 (72.72727273%)
Errors address range: 0x30000000..=0xAFFFFFFF  iteration:1
values range: 0x00000000..=0x00000000   FFFFFFFF-like count:0    bit-level stats table:
         0x0 0x1  0x2 0x3| 0x4 0x5  0x6 0x7| 0x8 0x9  0xA 0xB| 0xC 0xD  0xE 0xF
SinglIdx                 |   1             |                 |       1         
TogglCnt       2   56 761|6673 42k 205k793k|  2m  6m  14m 27m| 45m 63m  76m 81m
   0x1?  74m 58m  40m 24m| 12m  5m   1m589k|145k 28k 4277 457|  31   1         
1sInValu536m             |                 |                 |                 

Error found. Mode INITIAL_READ, total errors 0x20000000 out of 0x2C000000 (72.72727273%)
Errors address range: 0xE0000000..=0x15FFFFFFF  iteration:2
values range: 0x00000000..=0x00000000   FFFFFFFF-like count:0    bit-level stats table:
         0x0 0x1  0x2 0x3| 0x4 0x5  0x6 0x7| 0x8 0x9  0xA 0xB| 0xC 0xD  0xE 0xF
SinglIdx                 |   1             |                 |       1         
TogglCnt       2   56 760|6653 42k 204k789k|  2m  6m  14m 27m| 45m 63m  76m 81m
   0x1?  74m 58m  40m 24m| 12m  5m   1m589k|145k 28k 4277 457|  31   1         
1sInValu536m             |                 |                 |                 
galkinvv commented 1 year ago

In short: Thanks! Please install vulkaninfo tool and post the output of vulkaninfo | grep -B 20 -i StorageBufferRange

Detailed: Thanks for the update. It seems that the problem is caused by the fact that memtest_vulkan is ignoring maxStorageBufferRange limit reported by vulkan driver (hardcoded limit of TEST_WINDOW_MAX_SIZE=4GB is currently used instead). This limit is used internally by memtest_vulkan for selecting the memory region size tested by a single dispatched command.

A latptop with integrated Intel graphics has a maxStorageBufferRange=4GB, which corresponds to the internally hardcoded value, so it is working fine, but if the descrete intel GPU has this limit smaller - it may cause such behaviour. I'm working on a fix, however to recheck the problem reason it would be useful to see the output of the command mentioned above.

Example output for integrated Intel GPU:

 % vulkaninfo | grep -B 20 -i StorageBufferRange
GPU0:
VkPhysicalDeviceProperties:
---------------------------
        apiVersion        = 1.3.251 (4206843)
        driverVersion     = 23.1.99 (96473187)
        vendorID          = 0x8086
        deviceID          = 0x9a49
        deviceType        = PHYSICAL_DEVICE_TYPE_INTEGRATED_GPU
        deviceName        = Intel(R) Xe Graphics (TGL GT2)
        pipelineCacheUUID = 303aa8b0-afa9-b0a7-3336-7a249170c686

VkPhysicalDeviceLimits:
-----------------------
        maxImageDimension1D                             = 16384
        maxImageDimension2D                             = 16384
        maxImageDimension3D                             = 2048
        maxImageDimensionCube                           = 16384
        maxImageArrayLayers                             = 2048
        maxTexelBufferElements                          = 134217728
        maxUniformBufferRange                           = 1073741824
        maxStorageBufferRange                           = 4294967295
--
GPU1:
VkPhysicalDeviceProperties:
---------------------------
        apiVersion        = 1.3.251 (4206843)
        driverVersion     = 0.0.1 (1)
        vendorID          = 0x10005
        deviceID          = 0x0000
        deviceType        = PHYSICAL_DEVICE_TYPE_CPU
        deviceName        = llvmpipe (LLVM 15.0.7, 256 bits)
        pipelineCacheUUID = 76616c2d-742d-3038-3566-366563203200

VkPhysicalDeviceLimits:
-----------------------
        maxImageDimension1D                             = 16384
        maxImageDimension2D                             = 16384
        maxImageDimension3D                             = 4096
        maxImageDimensionCube                           = 32768
        maxImageArrayLayers                             = 2048
        maxTexelBufferElements                          = 134217728
        maxUniformBufferRange                           = 65536
        maxStorageBufferRange                           = 134217728
BA8F0D39 commented 12 months ago

@galkinvv vulkaninfo | grep -B 20 -i StorageBufferRange

GPU0:
VkPhysicalDeviceProperties:
---------------------------
    apiVersion        = 4206838 (1.3.246)
    driverVersion     = 96473187 (0x5c01063)
    vendorID          = 0x8086
    deviceID          = 0x56a0
    deviceType        = PHYSICAL_DEVICE_TYPE_DISCRETE_GPU
    deviceName        = Intel(R) Arc(tm) A770 Graphics (DG2)
    pipelineCacheUUID = fcc500d8-8a65-8853-deef-d91eec1217d5

VkPhysicalDeviceLimits:
-----------------------
    maxImageDimension1D                             = 16384
    maxImageDimension2D                             = 16384
    maxImageDimension3D                             = 2048
    maxImageDimensionCube                           = 16384
    maxImageArrayLayers                             = 2048
    maxTexelBufferElements                          = 134217728
    maxUniformBufferRange                           = 1073741824
    maxStorageBufferRange                           = 1073741824