Validation layer: fall through/loss of error in two-call idiom

rpavlik commented 4 years ago

The generated code for two-call idiom calls is producing a static analysis warning "Value stored to 'xr_result' is never read". which is because it's just falling through after setting xr_result instead of returning it or otherwise accumulating the error.

The code in question looks like:

        // Optional array must be non-NULL when viewCapacityInput is non-zero
        if (0 != viewCapacityInput && nullptr == views) {
            CoreValidLogMessage(gen_instance_info, "VUID-xrEnumerateViewConfigurationViews-views-parameter",
                                VALID_USAGE_DEBUG_SEVERITY_ERROR, "xrEnumerateViewConfigurationViews",
                                objects_info,
                                "Command xrEnumerateViewConfigurationViews param views is NULL, but viewCapacityInput is greater than 0");
            xr_result = XR_ERROR_VALIDATION_FAILURE;
        }
        // Non-optional pointer/array variable that needs to not be NULL
        if (nullptr == viewCountOutput) {
            CoreValidLogMessage(gen_instance_info, "VUID-xrEnumerateViewConfigurationViews-viewCountOutput-parameter",
                                VALID_USAGE_DEBUG_SEVERITY_ERROR, "xrEnumerateViewConfigurationViews", objects_info,
                                "Invalid NULL for uint32_t \"viewCountOutput\" which is not "
                                "optional and must be non-NULL");
            return XR_ERROR_VALIDATION_FAILURE;
        }

This is also triggering a later null dereference, when e.g. in the GenValidUsageInputsXrEnumerateSwapchainImages function, that warning is printed, but then we go through to iterate the null container, etc.

bradgrantham-lunarg commented 4 years ago

Is this as simple as

diff --git a/src/scripts/validation_layer_generator.py b/src/scripts/validation_layer_generator.py
index 8ec8d7d8..b7a93fef 100644
--- a/src/scripts/validation_layer_generator.py
+++ b/src/scripts/validation_layer_generator.py
@@ -1069,7 +1069,7 @@ class ValidationSourceOutputGenerator(AutomaticSourceOutputGenerator):
         if not full_count_var:
             array_check += '// Non-optional pointer/array variable that needs to not be NULL\n'
             array_check += self.writeIndent(indent)
-            array_check += 'if (nullptr == %s) {\n' % pointer_to_check
+            array_check += 'if (XR_SUCCESS == xr_result && nullptr == %s) {\n' % pointer_to_check
             indent = indent + 1                                                                                                                                                         array_check += self.writeIndent(indent)                                                                                                                                     array_check += 'CoreValidLogMessage(%s, "VUID-%s-%s-parameter",\n' % (instance_info_string,

?

rpavlik commented 4 years ago

Not quite sure without the generated context (I don't think so... I only included that second if statement to include some context, it's actually fine on its own), but we're not actually returning xr_result at any point in most of those functions, so we're actually swallowing a validation failure. Far as I can tell, the "easy" solution is replacing xr_result = XR_ERROR_VALIDATION_FAILURE; with return XR_ERROR_VALIDATION_FAILURE; to prevent swallowing this.

GenValidUsageInputsXrEnumerateSwapchainImages is probably the best illustration of how this bug attacks in a big way when combined with other parts of the layer.

bradgrantham-lunarg commented 4 years ago

I'm not sure why it doesn't just return xr_result right away. Maybe I should just change any case that sets the result to just return instead.

But in a scan in xr_generated_core_validation.cpp, I don't see any functions that don't return xr_result at the end. Do you have an example that doesn't so I can look into that? And in GenValidUsageInputsXrEnumerateSwapchainImages xr_result is checked as far as I can tell. Do you have a place in that function that you see xr_result can be != XR_SUCCESS but later XR_SUCCESS is returned?

bradgrantham-lunarg commented 4 years ago

Or rather, in GenValidUsageInputsXrEnumerateSwapchainImages , where xr_result can be != XR_SUCCESS but subsequently output structures are parsed?

bradgrantham-lunarg commented 4 years ago

Hey, @rpavlik , do you have an example in xr_generated_core_validation.cpp that doesn't return xr_result at the end? Or do you have a test case in GenValidUsageInputsXrEnumerateSwapchainImages that keeps going even if `xr_result != XR_SUCCESS'?

If not, I'll add the check I mention above in the patch and close this.

rpavlik commented 4 years ago

so it looks like passing null for the array, and non-0 for the input size, and non-null for countOutput - should trigger that warning. Then, e.g. line 11273 will do a null deref. xr_generated_core_validation.txt

(We should never do those loops: either exit early, or skip all other things touching images)

XrResult GenValidUsageInputsXrEnumerateSwapchainImages(
XrSwapchain swapchain,
uint32_t imageCapacityInput,
uint32_t* imageCountOutput,
XrSwapchainImageBaseHeader* images) {
    try {
        XrResult xr_result = XR_SUCCESS;
        std::vector<GenValidUsageXrObjectInfo> objects_info;
        objects_info.emplace_back(swapchain, XR_OBJECT_TYPE_SWAPCHAIN);

        {
            // writeValidateInlineHandleValidation
            ValidateXrHandleResult handle_result = VerifyXrSwapchainHandle(&swapchain);
            if (handle_result != VALIDATE_XR_HANDLE_SUCCESS) {
                // Not a valid handle or NULL (which is not valid in this case)
                std::ostringstream oss;
                oss << "Invalid XrSwapchain handle \"swapchain\" ";
                oss << HandleToHexString(swapchain);
                CoreValidLogMessage(nullptr, "VUID-xrEnumerateSwapchainImages-swapchain-parameter",
                                    VALID_USAGE_DEBUG_SEVERITY_ERROR, "xrEnumerateSwapchainImages",
                                    objects_info, oss.str());
                return XR_ERROR_HANDLE_INVALID;
            }
        }
        auto info_with_instance = g_swapchain_info.getWithInstanceInfo(swapchain);
        GenValidUsageXrHandleInfo *gen_swapchain_info = info_with_instance.first;
        (void)gen_swapchain_info;  // quiet warnings
        GenValidUsageXrInstanceInfo *gen_instance_info = info_with_instance.second;
        (void)gen_instance_info;  // quiet warnings
        // Optional array must be non-NULL when imageCapacityInput is non-zero
        if (0 != imageCapacityInput && nullptr == images) {
            CoreValidLogMessage(gen_instance_info, "VUID-xrEnumerateSwapchainImages-images-parameter",
                                VALID_USAGE_DEBUG_SEVERITY_ERROR, "xrEnumerateSwapchainImages",
                                objects_info,
                                "Command xrEnumerateSwapchainImages param images is NULL, but imageCapacityInput is greater than 0");
            xr_result = XR_ERROR_VALIDATION_FAILURE;

so I think now we should return here instead of just setting xr_result.

        }
        // Non-optional pointer/array variable that needs to not be NULL
        if (nullptr == imageCountOutput) {
            CoreValidLogMessage(gen_instance_info, "VUID-xrEnumerateSwapchainImages-imageCountOutput-parameter",
                                VALID_USAGE_DEBUG_SEVERITY_ERROR, "xrEnumerateSwapchainImages", objects_info,
                                "Invalid NULL for uint32_t \"imageCountOutput\" which is not "
                                "optional and must be non-NULL");
            return XR_ERROR_VALIDATION_FAILURE;
        }
        // NOTE: Can't validate "VUID-xrEnumerateSwapchainImages-imageCountOutput-parameter" type
        for (uint32_t value_images_inc = 0; value_images_inc < imageCapacityInput; ++value_images_inc) {
#if defined(XR_USE_GRAPHICS_API_OPENGL)
            // Validate if XrSwapchainImageBaseHeader is a child structure of type XrSwapchainImageOpenGLKHR and it is valid
            XrSwapchainImageOpenGLKHR* new_swapchainimageopenglkhr_value = reinterpret_cast<XrSwapchainImageOpenGLKHR*>(images);
            if (new_swapchainimageopenglkhr_value[value_images_inc].type == XR_TYPE_SWAPCHAIN_IMAGE_OPENGL_KHR) {

and that's the segfault. The check below is too late, we already dereferenced thru that pointer.

                if (nullptr != new_swapchainimageopenglkhr_value) {
                    xr_result = ValidateXrStruct(gen_instance_info, "xrEnumerateSwapchainImages",
                                                                    objects_info, false, &new_swapchainimageopenglkhr_value[value_images_inc]);
                    if (XR_SUCCESS != xr_result) {
                        std::string error_message = "Command xrEnumerateSwapchainImages param images";
                        error_message += "[";
                        error_message += std::to_string(value_images_inc);
                        error_message += "]";
                        error_message += " is invalid";
                        CoreValidLogMessage(gen_instance_info, "VUID-xrEnumerateSwapchainImages-images-parameter",
                                            VALID_USAGE_DEBUG_SEVERITY_ERROR, "xrEnumerateSwapchainImages",
                                            objects_info,
                                            error_message);
                        return XR_ERROR_VALIDATION_FAILURE;
                        break;
                    } else {
                        continue;
                        }
                }
            }
#endif // defined(XR_USE_GRAPHICS_API_OPENGL)

Omitted the other apis. Once we get down here, we overwrite xr_result anyway.

            // Validate that the base-structure XrSwapchainImageBaseHeader is valid
            xr_result = ValidateXrStruct(gen_instance_info, "xrEnumerateSwapchainImages", objects_info,
                                                            true, &images[value_images_inc]);
            if (XR_SUCCESS != xr_result) {
                CoreValidLogMessage(gen_instance_info, "VUID-xrEnumerateSwapchainImages-images-parameter",
                                    VALID_USAGE_DEBUG_SEVERITY_ERROR, "xrEnumerateSwapchainImages",
                                    objects_info,
                                    "Command xrEnumerateSwapchainImages param images is invalid");
                return xr_result;
            }
        }
        return XR_SUCCESS;
    } catch (...) {
        return XR_ERROR_VALIDATION_FAILURE;
    }
}

rpavlik commented 4 years ago

Even easier to understand: here's a diff to apply to hello_xr to trigger this bug. Sorry I didn't think of this idea earlier :)

diff --git a/src/tests/hello_xr/openxr_program.cpp b/src/tests/hello_xr/openxr_program.cpp
index 4e8976d..04b0147 100644
--- a/src/tests/hello_xr/openxr_program.cpp
+++ b/src/tests/hello_xr/openxr_program.cpp
@@ -647,6 +647,11 @@ struct OpenXrProgram : IOpenXrProgram {
                 m_swapchains.push_back(swapchain);

                 uint32_t imageCount;
+
+                // This is not valid usage, but it shouldn't crash the validation layer: we're saying we have capacity 5 when we are
+                // passing a null pointer instead.
+                xrEnumerateSwapchainImages(swapchain.handle, 5, &imageCount, nullptr);
+
                 CHECK_XRCMD(xrEnumerateSwapchainImages(swapchain.handle, 0, &imageCount, nullptr));
                 // XXX This should really just return XrSwapchainImageBaseHeader*
                 std::vector<XrSwapchainImageBaseHeader*> swapchainImages =

This segfaults when run with the validation layer enabled.

bradgrantham-lunarg commented 4 years ago

I said:

I'm not sure why it doesn't just return xr_result right away. Maybe I should just change any case that sets the result to just return instead.

You said:

so I think now we should return here instead of just setting xr_result

I'm glad you agree with me. I'll look into the generator.

rpavlik commented 4 years ago

Glad you could see through my confused writing - this was all a little blurry to me until I made that sample code.

rpavlik-bot commented 4 years ago

An issue (number 1321) has been filed to correspond to this issue in the internal Khronos GitLab.

If you have a Khronos account, you can access that issue at KHR:openxr/openxr#1321.

bradgrantham-lunarg commented 4 years ago

Re: email

I think both you and I had the question, "why not just return XR_ERROR_VALIDATION_FAILURE if a failure was detected?"

After asking around LunarG, the philosophy behind that is that all validation failures that can be (reasonably) reported should be. So then the process for any given validation failure is: print out a message about the failure, and make sure an error is eventually returned for this call. I think that's why there's weird "xr_result = XR_ERROR_XYZ" followed by more code.

I think I was wrong earlier about suggesting we just return. I believe (like #161) the validation layer should continue after setting xr_result, so it should make sure to check pointers later. I'll look into this a la gitlab MR 1709.

bradgrantham-lunarg commented 4 years ago

@rpavlik - How can I reproduce the static analysis report? [edit: that is to say, what tool do I use with what configuration?]

rpavlik commented 4 years ago

Well, it was clang-tidy, though iirc I was using it within CodeChecker

Think you should be able to do something like this (assuming that you have your build dir in build/ and that you're using a single-config generator like ninja - might need to adjust paths)

clang-tidy -p build/ src/api_layers/*.cpp build/src/api_layers/*.cpp

(explained: bit by bit: Look for the project's compile_commands.json in build/ -- might need to turn on CMAKE_EXPORT_COMPILE_COMMANDS -- and check all c++ files for the layers in the source tree as well as in the generated sources)

Your clang-tidy binary might also be called something else (clang-tidy-9, etc) and/or not be on your path.

The most concise command to check this particular file would be:

clang-tidy -p build build/src/api_layers/xr_generated_core_validation.cpp  --checks=-misc-unused-parameters,-bugprone-branch-clone

the extra parameter turns off some noisy warnings that don't really matter for this generated code.

bradgrantham-lunarg commented 4 years ago

Thanks.

I've fought with this for a little while, continue to be unable to run clang-tidy. (I don't have a compile_commands.json after running cmake with CMAKE_EXPORT_COMPILE_COMMANDS=true. Do I have to also build with CMake? Because I'm building with Visual Studio...)

In any case, I'm now kicking it back to you as the the bug filer to verify KHR:openxr/openxr!1715 works.

I've verified in the KHR:openxr/openxr!1715 that the code path now returns in the case of that validation failure. Are you in a position where you could re-run clang-tidy on one of your trees against KHR:openxr/openxr!1715?

rpavlik commented 4 years ago

Yes, it looks like it passes. (A stricter clang-tidy now gripes about implicit conversion of a pointer to a bool, but I don't care about that :D ) Thanks!

bradgrantham-lunarg commented 4 years ago

Thanks!

bradgrantham-lunarg commented 4 years ago

Close this one also?

rpavlik commented 4 years ago

Yep, sorry I missed it.

KhronosGroup / OpenXR-SDK-Source

Validation layer: fall through/loss of error in two-call idiom #160