godotengine / godot

Godot Engine – Multi-platform 2D and 3D game engine
https://godotengine.org
MIT License
89.6k stars 20.75k forks source link

Constructing a varying vector exceeding a declaration limit within a shader freezes the editor #76667

Open ardlak opened 1 year ago

ardlak commented 1 year ago

Godot version

4.0.2.stable.official

System information

Windows 10, NVIDIA GeForce RTX 2080 SUPER, Driver 516.94, Vulkan backend

Issue description

If there is a shadermaterial being rendered when a varying vector that exceeds a declaration limit is constructed within the vertex function of its shader, the editor will freeze.

Example code:

shader_type spatial;

varying vec3 vA01;
varying vec3 vA02;

varying vec3 Va01;
varying vec3 Va02;
...
varying vec3 Va20;

void vertex() {
    vA01 = vec3(1); // safe
    vA02 = vec3(1); // will freeze
}

Steps to reproduce

  1. Set up a shadermaterial and give it a shader.
  2. In the shader's code, include only the "shader_type spatial;" tag and an empty vertex function.
  3. Declare 22 varying vectors of any type, use any combination of letter cases for the beginning of the names
  4. In the vertex function, construct the last lowercase vector that was declared (or last uppercase if none)
  5. Render the shadermaterial in the inspector preview or some other way

    Minimal reproduction project

issue_just_why.zip

Calinou commented 1 year ago

I can confirm this on 4.1.dev d6dde819b (Linux, GeForce RTX 4090 with NVIDIA 530.41.03):

ERROR: Vulkan: Cannot submit graphics queue. Error code: VK_ERROR_DEVICE_LOST
at: swap_buffers (drivers/vulkan/vulkan_context.cpp:2357)
ERROR: Vulkan: Cannot submit graphics queue. Error code: VK_ERROR_DEVICE_LOST
at: swap_buffers (drivers/vulkan/vulkan_context.cpp:2357)
ERROR: Vulkan: Did not create swapchain successfully. Error code: VK_NOT_READY
at: prepare_buffers (drivers/vulkan/vulkan_context.cpp:2280)
ERROR: Vulkan: Cannot submit graphics queue. Error code: VK_ERROR_DEVICE_LOST
at: swap_buffers (drivers/vulkan/vulkan_context.cpp:2357)
ERROR: Vulkan: Did not create swapchain successfully. Error code: VK_NOT_READY
at: prepare_buffers (drivers/vulkan/vulkan_context.cpp:2280)
ERROR: Vulkan: Cannot submit graphics queue. Error code: VK_ERROR_DEVICE_LOST
at: swap_buffers (drivers/vulkan/vulkan_context.cpp:2357)
ERROR: Vulkan: Did not create swapchain successfully. Error code: VK_NOT_READY
at: prepare_buffers (drivers/vulkan/vulkan_context.cpp:2280)
ERROR: Vulkan: Cannot submit graphics queue. Error code: VK_ERROR_DEVICE_LOST
at: swap_buffers (drivers/vulkan/vulkan_context.cpp:2357)

Modifying the shader without the ShaderMaterial preview being visible in the inspector does not result in a freeze.

This likely occurs because the uniform buffer size limit (or some other limit) is exceeded, but the editor or shader compiler doesn't check for limits.

The issue still occurs if replacing all vec3s in the shader with vec2s.

Running with Vulkan validation layers installed and --gpu-validation --gpu-abort returns the following:

ERROR: VALIDATION - Message Id Number: -1553903733 | Message Id Name: VUID-RuntimeSpirv-Location-06272
    Validation Error: [ VUID-RuntimeSpirv-Location-06272 ] Object 0: VK_NULL_HANDLE, type = VK_OBJECT_TYPE_PIPELINE; | MessageID = 0xa3614f8b | Invalid Pipeline CreateInfo State: Vertex shader output variable uses location that exceeds component limit VkPhysicalDeviceLimits::maxVertexOutputComponents (128) The Vulkan spec states: The sum of Location and the number of locations the variable it decorates consumes must be less than or equal to the value for the matching {ExecutionModel} defined in Shader Input and Output Locations (https://www.khronos.org/registry/vulkan/specs/1.3-extensions/html/vkspec.html#VUID-RuntimeSpirv-Location-06272)
    Objects - 1
        Object[0] - VK_OBJECT_TYPE_PIPELINE, Handle 0
   at: _debug_messenger_callback (drivers/vulkan/vulkan_context.cpp:267)
ERROR: Crashing, because abort on GPU errors is enabled.
   at: _debug_messenger_callback (drivers/vulkan/vulkan_context.cpp:268)
ardlak commented 1 year ago

I did a little more looking into it:

varying mat4 m; // lowercase

// lowercase
varying vec3 vA01;
varying vec3 vA02;
varying vec3 vA03;
...
varying vec3 vA17;
varying vec3 vA18;

void vertex() {
    vA17 = vec3(1.); // safe
    vA18 = vec3(1.); // will freeze
}
varying mat4 m; // lowercase

//uppercase
varying vec3 Va01;
varying vec3 Va02;
varying vec3 Va03;
...
varying vec3 Va21;
varying vec3 Va22;

void vertex() {
    Va21 = vec3(1.); // safe
    Va22 = vec3(1.); // will freeze
}

Switching the case of the matrix name in the second example:

varying mat4 M; // uppercase
...
void vertex() {
    Va17 = vec3(1.); // safe
    Va18 = vec3(1.); // will freeze
}

So I suppose there are separate considerations for uppercase and lowercase variables. The limit in both examples seems to be 21 vectors regardless of type. Adding a mat4 adds four vectors, lowering the limit of specific vector declarations by four. mat3 and mat2 types work similarly, lowering the limit by however many vectors make up the type.

Additionally, if you combine the vector declarations from both examples:

varying vec3 vA01;
varying vec3 vA02;

...
varying vec3 Va20;

void vertex() {
    vA01 = vec3(1); // safe
    vA02 = vec3(1); // will freeze
}

No matter which order you declare the cases in, the uppercase ones seem to take priority and act as if they were declared first.

clayjohn commented 1 year ago

We need to add a clear user-facing error when users exceed the number of varyings supported by user hardware.

Godot uses up to 11 varyings and it reserves the slots for those 11. Vulkan devices are only guaranteed to support 16 varyings (64 components / 4), but most devices support 32 https://vulkan.gpuinfo.org/displaydevicelimit.php?name=maxVertexOutputComponents&platform=all (except apple devices which seem to have one less)

Calinou commented 1 year ago

Interestingly, I don't get a crash on Windows 11 on the same GeForce RTX 4090 (NVIDIA 531.61), even if --gpu-validation --gpu-abort is used:

ERROR: Vulkan Debug Report: object - -2792181191434809889
Validation Error: [ VUID-RuntimeSpirv-Location-06272 ] Object 0: handle = 0xd9402e0000004ddf, type = VK_OBJECT_TYPE_SHADER_MODULE; | MessageID = 0xa3614f8b | vkCreateGraphicsPipelines(): pCreateInfos[0] Vertex shader output variable uses location that exceeds component limit VkPhysicalDeviceLimits::maxVertexOutputComponents (128) The Vulkan spec states: The sum of Location and the number of locations the variable it decorates consumes must be less than or equal to the value for the matching {ExecutionModel} defined in Shader Input and Output Locations (https://vulkan.lunarg.com/doc/view/1.3.243.0/windows/1.3-extensions/vkspec.html#VUID-RuntimeSpirv-Location-06272)
   at: _debug_report_callback (drivers/vulkan/vulkan_context.cpp:300)
ERROR: Vulkan Debug Report: object - -7333046566405517856
Validation Error: [ VUID-RuntimeSpirv-Location-06272 ] Object 0: handle = 0x9a3bc90000004de0, type = VK_OBJECT_TYPE_SHADER_MODULE; | MessageID = 0xa3614f8b | vkCreateGraphicsPipelines(): pCreateInfos[0] Fragment shader input variable uses location that exceeds component limit VkPhysicalDeviceLimits::maxFragmentInputComponents (128) The Vulkan spec states: The sum of Location and the number of locations the variable it decorates consumes must be less than or equal to the value for the matching {ExecutionModel} defined in Shader Input and Output Locations (https://vulkan.lunarg.com/doc/view/1.3.243.0/windows/1.3-extensions/vkspec.html#VUID-RuntimeSpirv-Location-06272)
   at: _debug_report_callback (drivers/vulkan/vulkan_context.cpp:300)

The editor doesn't freeze at all, it just continues rendering after a small fraction of a second and can still be used. This indicates that --gpu-abort may not be working correctly on Windows. I'm using Vulkan SDK 1.3.243.0.

SamsPepper1 commented 6 months ago

Just flagging that this is still an issue in 4.3-dev 5 on a NVIDIA GeForce GTX 980 Ti.

To get back into the project I had to corrupt the scenes using that offending shader (by renaming it).

In my case any more than 32 components (ie. floats) in varyings caused the freeze.

Perhaps if it is a complicaed fix (accounting for hardware, etc) a note in the shaders section of the documentation could support users in identifying the cause of the problem?