GPUOpen-Tools / radeon_gpu_analyzer

The Radeon GPU Analyzer (RGA) is an offline compiler and code analysis tool for Vulkan, DirectX, OpenGL, and OpenCL.
MIT License
405 stars 53 forks source link

failed to convert Vulkan driver statistics to RGA format on Linux #71

Open farnoy opened 3 years ago

farnoy commented 3 years ago

Hi,

I'm trying to get a very simple pipeline analyzed, but I can't get RGA to work in online mode. The output just says Error: failed to convert Vulkan driver statistics to RGA format.

I believe it's failing right here https://github.com/GPUOpen-Tools/radeon_gpu_analyzer/blob/f2cb7cf71ed620c427e2625aba3e85b70e9537bb/RadeonGPUAnalyzerCLI/Src/kcCLICommanderVulkan.cpp#L522

I'm on Linux x64 5.8.7, AMDVLK 2020.Q3.4 and:

$ vulkaninfo | rg PhysicalDeviceProp -A10
VkPhysicalDeviceProperties:
---------------------------
    apiVersion     = 4202646 (1.2.150)
    driverVersion  = 8388763 (0x80009b)
    vendorID       = 0x1002
    deviceID       = 0x66af
    deviceType     = PHYSICAL_DEVICE_TYPE_DISCRETE_GPU
    deviceName     = AMD Radeon VII

Is this an incompatibility with the latest AMDVLK release?

Also a few minor questions if I may:

AmitBM commented 3 years ago

Hi farnoy,

  1. What Linux variant are you using? Note that RGA officially only support Ubuntu.
  2. Is there a Vulkan ICD manifest file present under /opt/amdgpu-pro/etc/vulkan/icd.d/amd_icd64.json on your system? If you set the VK_ICD_FILENAMES environment variable to /opt/amdgpu-pro/etc/vulkan/icd.d/amd_icd64.json - are you still seeing the same error? This should force the amdgpu-pro driver to be used (which is the driver RGA relies on).

-=-=-=-

To your questions:

farnoy commented 3 years ago

I'm on Archlinux and my AMDVLK installation is being built from source with this script

I used VK_ICD_FILENAMES in the original report, I have a separate Mesa radv stack that I didn't want RGA to use.

Thanks for taking my questions, I was indeed referring referring to the OpLine instructions that both glslang and dxc can output. It should be a useful feature for livereg and/or assembly when it's ready.

I did a bit more digging and found something interesting. I've modified the rga bash script wrapper to execute rga-bin --verbose "$@". This showed me intermediate commands in the GUI window. The full output was:


Building Vulkan project "asd" for gfx906

./rga -s vulkan --isa "/home/kuba/RadeonGPUAnalyzer/projects/asd/Output/Clone0/disassem.txt" --parse-isa --line-numbers --analysis "/home/kuba/RadeonGPUAnalyzer/projects/asd/Output/Clone0/resourceUsage.csv" -b "/home/kuba/RadeonGPUAnalyzer/projects/asd/Output/Clone0/codeobj.bin" --log "/home/kuba/.local/share/RadeonGPUAnalyzer/rga-cli-20200909-214907.log" --icd "/usr/share/vulkan/icd.d/amd_icd64.json" --glslang-opt "@--target-env vulkan1.1@" --compiler-bin "/home/kuba/1.2.148.1/x86_64/bin" --session-metadata "/home/kuba/RadeonGPUAnalyzer/projects/asd/Output/Clone0/gfx906_cliInvocation.xml" --asic gfx906 --pso "/home/kuba/RadeonGPUAnalyzer/projects/asd/Clone0/Pipeline0.gpso" --vert "/data/renderer/src/shaders/gui.vert" --frag "/data/renderer/src/shaders/gui.frag"

Info: forcing the Vulkan runtime to load a custom ICD: /usr/share/vulkan/icd.d/amd_icd64.json

Launching external process: /home/kuba/rga/Vulkan//VulkanBackend --list-targets --icd /usr/share/vulkan/icd.d/amd_icd64.json Target GPU detected:

gfx906 (Vega) AMD Radeon VII

Pre-compiling vertex shader file (/data/renderer/src/shaders/gui.vert) to SPIR-V binary (/home/kuba/.rga/GPUOpen/rga/all-devices_rga-temp-out2393283_vert.spv)... Launching external process: /home/kuba/1.2.148.1/x86_64/bin/glslangValidator --target-env vulkan1.1 -V -o /home/kuba/.rga/GPUOpen/rga/all-devices_rga-temp-out2393283_vert.spv /data/renderer/src/shaders/gui.vert succeeded. Pre-compiling fragment shader file (/data/renderer/src/shaders/gui.frag) to SPIR-V binary (/home/kuba/.rga/GPUOpen/rga/all-devices_rga-temp-out2393283_frag.spv)... Launching external process: /home/kuba/1.2.148.1/x86_64/bin/glslangValidator --target-env vulkan1.1 -V -o /home/kuba/.rga/GPUOpen/rga/all-devices_rga-temp-out2393283_frag.spv /data/renderer/src/shaders/gui.frag succeeded. Building for gfx906... Launching external process: /home/kuba/rga/Vulkan//VulkanBackend --target gfx906 --vert /home/kuba/.rga/GPUOpen/rga/all-devices_rga-temp-out2393283_vert.spv --vert-isa /home/kuba/RadeonGPUAnalyzer/projects/asd/Output/Clone0/gfx906_disassem_vert.txt --vert-stats /home/kuba/RadeonGPUAnalyzer/projects/asd/Output/Clone0/gfx906_resourceUsage_vert.csv --frag /home/kuba/.rga/GPUOpen/rga/all-devices_rga-temp-out2393283_frag.spv --frag-isa /home/kuba/RadeonGPUAnalyzer/projects/asd/Output/Clone0/gfx906_disassem_frag.txt --frag-stats /home/kuba/RadeonGPUAnalyzer/projects/asd/Output/Clone0/gfx906_resourceUsage_frag.csv --bin /home/kuba/RadeonGPUAnalyzer/projects/asd/Output/Clone0/gfx906_codeobj.bin --pso /home/kuba/RadeonGPUAnalyzer/projects/asd/Clone0/Pipeline0.gpso --icd /usr/share/vulkan/icd.d/amd_icd64.json

Using Vulkan ICD from custom location: /usr/share/vulkan/icd.d/amd_icd64.json

failed. Error: failed to convert Vulkan driver statistics to RGA format.

However, when I ran the 3 leaf commands manually in a shell (two glslang's and VulkanBackend), it all works fine and to show this:

$ /home/kuba/rga/Vulkan//VulkanBackend --target gfx906 \
  --vert /home/kuba/.rga/GPUOpen/rga/all-devices_rga-temp-out2393283_vert.spv \
  --frag /home/kuba/.rga/GPUOpen/rga/all-devices_rga-temp-out2393283_frag.spv \
  --frag-stats /dev/stdout \
  --pso /home/kuba/RadeonGPUAnalyzer/projects/asd/Clone0/Pipeline0.gpso \
  --icd /usr/share/vulkan/icd.d/amd_icd64.json

Using Vulkan ICD from custom location: /usr/share/vulkan/icd.d/amd_icd64.json
Statistics:
    - shaderStageMask                           = 16
    - resourceUsage.numUsedVgprs                = 24
    - resourceUsage.numUsedSgprs                = 14
    - resourceUsage.ldsSizePerLocalWorkGroup    = 65536
    - resourceUsage.ldsUsageSizeInBytes         = 0
    - resourceUsage.scratchMemUsageInBytes      = 0
    - numPhysicalVgprs                          = 256
    - numPhysicalSgprs                          = 800
    - numAvailableVgprs                         = 256
    - numAvailableSgprs                         = 104

~So when the GUI invokes it, it fails, but when I do the same thing from from the shell it works.~ EDIT: nevermind, it's the GUI that throws an error

I tried redirecting the VulkanBackend binary with a script like this to enable api_dump:

#!/usr/bin/fish

ls ~/.rga/GPUOpen/rga/*.spv

set -x VK_INSTANCE_LAYERS VK_LAYER_LUNARG_api_dump
set -x VK_APIDUMP_LOG_FILENAME /tmp/vulkanbackend-api-dump

eval (dirname (status -f))/VulkanBackend-bin $argv

I also verified that the .spv files exist (they do) and set up API dump. But everything exits cleanly with the last API calls being:

Thread 0, Frame 0:
vkGetShaderInfoAMD(device, pipeline, shaderStage, infoType, pInfoSize, pInfo) returns VkResult VK_SUCCESS (0):
    device:                         VkDevice = 0x37aa990
    pipeline:                       VkPipeline = 0x326f760
    shaderStage:                    VkShaderStageFlagBits = 16 (VK_SHADER_STAGE_FRAGMENT_BIT)
    infoType:                       VkShaderInfoTypeAMD = VK_SHADER_INFO_TYPE_DISASSEMBLY_AMD (2)
    pInfoSize:                      size_t* = 2156
    pInfo:                          void* = 0x345c860

Thread 0, Frame 0:
vkGetShaderInfoAMD(device, pipeline, shaderStage, infoType, pInfoSize, pInfo) returns VkResult VK_SUCCESS (0):
    device:                         VkDevice = 0x37aa990
    pipeline:                       VkPipeline = 0x326f760
    shaderStage:                    VkShaderStageFlagBits = 16 (VK_SHADER_STAGE_FRAGMENT_BIT)
    infoType:                       VkShaderInfoTypeAMD = VK_SHADER_INFO_TYPE_STATISTICS_AMD (0)
    pInfoSize:                      size_t* = 72
    pInfo:                          void* = 0x7ffcf9966140

Thread 0, Frame 0:
vkDestroyPipeline(device, pipeline, pAllocator) returns void:
    device:                         VkDevice = 0x37aa990
    pipeline:                       VkPipeline = 0x326f760
    pAllocator:                     const VkAllocationCallbacks* = NULL

So for each stage, it's collecting the binary, disassembly and statistics, all as expected. The only weird thing is that it returns fictional devices. I guess that's configured out of band because I only see the effects:

4808   │ Thread 0, Frame 0:
4809   │ vkGetPhysicalDeviceProperties(physicalDevice, pProperties) returns void:
4810   │     physicalDevice:                 VkPhysicalDevice = 0x384ab10
4811   │     pProperties:                    VkPhysicalDeviceProperties* = 0x7ffcf86bb060:
4812   │         apiVersion:                     uint32_t = 0
4813   │         driverVersion:                  uint32_t = 0
4814   │         vendorID:                       uint32_t = 0
4815   │         deviceID:                       uint32_t = 31
4816   │         deviceType:                     VkPhysicalDeviceType = VK_PHYSICAL_DEVICE_TYPE_OTHER (0)
4817   │         deviceName:                     char[VK_MAX_PHYSICAL_DEVICE_NAME_SIZE] = "NAVI14:gfx1012"

On the other hand, the GUI only has a problem with my vertex shader, if I remove it from the pipeline, offline mode is used but I don't see the error about driver statistics. If I remove the fragment shader and leave only the vertex, it fails again. Not sure what makes it so special, it's a very simple shader:

#version 450

#extension GL_EXT_scalar_block_layout: require

layout(push_constant, scalar) uniform PushConstants {
    vec2 scale;
    vec2 translate;
} pushConstants;

layout (location = 0) in vec2 pos;
layout (location = 1) in vec2 uv;
layout (location = 2) in vec4 col;

layout (location = 0) out vec4 out_color;
layout (location = 1) out vec2 out_uv;

void main() {
  out_color = col;
  out_uv = uv;
  gl_Position = vec4(pos * pushConstants.scale + pushConstants.translate, 0, 1);
  gl_Position.y *= -1.0;
}

I hope this helps. I understand that only Ubuntu is officially supported, but seeing as multiple other components are working fine, this seems to be a legitimate issue.