KhronosGroup / Vulkan-ValidationLayers

Vulkan Validation Layers (VVL)
https://vulkan.lunarg.com/doc/sdk/latest/linux/khronos_validation_layer.html
Other
751 stars 402 forks source link

Validation Error Message Format Improvements and Standards Tracking Issue [Now tracked in this repositories Projects page] #679

Closed JasperNV closed 3 years ago

JasperNV commented 5 years ago

Currently, the error messages that the validation layer spits out has two main components: a hand-written error message, and the spec VU violated. This tends to lead to a general restatement of the same thing twice. As an example, I got this error message while porting a codebase to Vulkan:

Object: VK_NULL_HANDLE (Type = 0) | vkCreateSampler(): when pCreateInfo->unnormalizedCoordinates is VK_TRUE, pCreateInfo->minLod (0.000000) and pCreateInfo->maxLod (16.000000) must both be zero. The Vulkan spec states: If unnormalizedCoordinates is VK_TRUE, minLod and maxLod must be zero (https://www.khronos.org/registry/vulkan/specs/1.1-extensions/html/vkspec.html#VUID-VkSamplerCreateInfo-unnormalizedCoordinates-01074)

The VU spec text itself is probably good enough for the developer to understand what went wrong; the hand-written error message is redundant, and the VK_NULL_HANDLE prelude is basically "we have no object to attach this error message to"; which is just a side effect of requiring an object in the internal API, so it should probably be dropped as well.

Additionally, the developers that I have talked to have remarked that validation error messages could use a cleanup as well. I would be happy to champion an effort to improve this.

mark-lunarg commented 5 years ago

Hey Jasper -- there have been issues raised about the duplication, but I'd personally prefer to lose the spec text instead of the hand-written message (at least, for the core validation checks) though this is not necessarily a popular view. Coincidentally, yesterday a group of us discussed ways to address issue #34, and have had significant feedback that adding the labels to objects in the existing messages is desirable. A lighter-weight approach seemed most popular (basically, hijacking some of the HandleToUint64() calls) but nothing is set in stone and we'd appreciate any feedback, ideas, or effort you'd be willing to lend. Can you share your ideas for what a cleanup effort might entail? It'd also be good to hear from @tobine as well...

JasperNV commented 5 years ago

Debug names would be a huge win, and was part of my plan.

My only reason to accept the spec text is that it's there, and it's written, and that in the cases where it's good enough, it doesn't require effort to use. My reword proposal for that error message would be:

If unnormalizedCoordinates is VK_TRUE, minLod and maxLod must be zero. Additional details: Caller provided minLod = -1234.0, maxLod = 1234.0. More details: https://www.khronos.org/registry/vulkan/specs/1.1-extensions/html/vkspec.html#VUID-VkSamplerCreateInfo-unnormalizedCoordinates-01074

"Additional details" is hand-written and only provides additional details not seen in the error message.

A longer-term goal would be replacing the URL with something like https://vk.khronos.org/VUID-VkSamplerCreateInfo-unnormalizedCoordinates-01074, which at first might be a redirect to the spec, but could morph into a resource of its own that provides additional context or suggestions for fixes, similar to some tools seen in the web space, like React or Angular.

jzulauf-lunarg commented 5 years ago

@JasperNV one of the worst offenders in verbosity is the duplication of information in messages coming out of the descriptor set operation validation. Often the caller and each nested callee add the same information to the output string -- if you were looking for something to do :). (I just noticed while debugging.)

JasperNV commented 5 years ago

I'm happy to get started on this but as mentioned, it's a very long project and I'm basically going to need some alignment about what the error message should be before I dive in and make large changes.

jzulauf-lunarg commented 5 years ago

U'd be interest in feedback on this example (that I'm writing). In using that message I'd be loathe to leave out any of the detail (6 unique with one repeated)

"vkCmdClearAttachments() pAttachments[%u].colorAttachment=%u is not VK_ATTACHMENT_UNUSED and not a valid attachment for render pass 0x%" PRIx64 " attachmentCount=%u. Subpass %u pColorAttachment[%u]=%u.",

The fields are the attachment index, the attachment value found there. The renderpass handle, the number of attachments in the renderpass, the subpass index, and the context of the attachment reference at the colorAttachment'th location.

With VUID text

If the aspectMask member of any element of pAttachments contains VK_IMAGE_ASPECT_COLOR_BIT, then the colorAttachment member of that element must either refer to a color attachment which is VK_ATTACHMENT_UNUSED, or must be a valid color attachment. (https://www.khronos.org/registry/vulkan/specs/1.1-extensions/html/vkspec.html#VUID-vkCmdClearAttachments-aspectMask-02501)

I could see removing the "not VK_ATTACHMENT_UNUSED and " -- should the target style be tighter? How might we express this in some sort of styleguide?

JasperNV commented 5 years ago

So, what does "valid attachment" mean here? Out of bounds in the pColorAttachment array? In that case:

If the aspectMask member of any element of pAttachments contains VK_IMAGE_ASPECT_COLOR_BIT, then the colorAttachment member of that element must either refer to a color attachment which is VK_ATTACHMENT_UNUSED, or must be a valid color attachment. Additional details: colorAttachment = %d, out of range of pColorAttachments which has size %d

jzulauf-lunarg commented 5 years ago

It means (AFAICT) 1) pAttachments[%u].colorAttachment=%u is less that the number of pColorAttachments in the current subpass (for which I have a separate message with the same VUID)

"vkCmdClearAttachments() pAttachments[%u].colorAttachment=%u out of range for subpass %u. colorAttachmentCount=%u",

2) current subpass color attachment reference correlated to the value we just range checked pColorAttachments[pAttachment[index].colorAttachment].attachment is either UNUSED or a valid attachment reference to the current renderpass (i.e. that the value we just looked up is less than the number of attachments defined for the current renderpass)

This is actually a pretty good example of why clear, terse messages are hard.

jzulauf-lunarg commented 5 years ago

@JasperNV -- the message you propose doesn't inform the developer which pAttachment index the error occurs at, or the current subpass which has a limit we've exceeded, nor the renderpass active (needed to determine which subpass definition is being used) -- all of which were referenced for the validation check.

In developing our style recommendations the completeness of message context is probably a good rubric to establish.

JasperNV commented 5 years ago

My working model (and maybe this is wrong) is that developers that encounter validation failures will be working with a debugger and have a breakpoint on the error callback. Any context that is in their call stack can easily be found; what can't easily be found is stuff hidden away in the CreateInfo of other objects (it would be nice if validation layers could track this somehow, with natvis files provided so that you can get to the CreateInfo for a native vk object in your watch window, but that's a separate project).

The current subpass is something that's likely to be in the call stack, knowing that it's render pass 0x5183ef04 subpass 2 isn't really too helpful.

i.e. that the value we just looked up is less than the number of attachments defined for the current renderpass

This should already be checked at renderpass creation time, no?

jzulauf-lunarg commented 5 years ago

@JasperNV -- looks like a completely different workflow than mine. Typically (toy problems and ULT's aside) I'm looking at the output of validation messages without access to the application or driver source code. I can attach in the debugger and inspect the variable state in the layers themselves to get more information, but I can't imagine that most developers are willing to code crawl the VVL to get the information they need.

Neither the renderpass nor the current subpass are likely on the stack in any case as they are command buffer state.

i.e. that the value we just looked up is less than the number of attachments defined for the current renderpass

This should already be checked at renderpass creation time, no?

Hopefully, but as I'm about to look it up in the in the layers' copy of the renderpass create info state, I need to check it anyway.

JasperNV commented 5 years ago

Hopefully, but as I'm about to look it up in the in the layers' copy of the renderpass create info state, I need to check it anyway.

Right, but in theory this VU should already be covered by something that has already fired in the past with more context, no? Spending time wording a VU that users won't effectively see seems strange to me.

jzulauf-lunarg commented 5 years ago

@JasperNV FYI -- we're going to be implementing a small portion of the work discussed in #34 -- specifically ensuring that all handles output, which have been given names using the debug extensions will have those names provided in the debug messages output. This is both a long standing requested item and a good initial project for a new member of the VVL team. It will also touch and affect nearly every log_msg invocation.

As we don't have clear line-of-sight to an agreed style-guide, this project hopefully won't interfere with any rework that is done for this... but I thought you should be aware that a broad change that could cause significant merge conflicts is underway.

JasperNV commented 5 years ago

Thank you for the update. I should have some limited time to help with the migration of log_msg calls as well.

jzulauf-lunarg commented 5 years ago

After reaching out to a few people in the developer community we have a plan for how to improve error messages moving forward:

0) Provide Full information Based on feedback from developers, provide full information (all handles, indices, values, etc.) , even though most worked with a debugger attached. The information can and should be presented more effectively, with specific input below.

1) All handle names will be translate to specified obj names when present (fully implemented AFAIK) in a consistent format:

__handleTypeName__ 0x%l016llx [__definedName__]

Given the way the spec defines handles fo 32 bit compilations, __handleTypeName__ will be "NONDISPATCHABLE" (or similar) for all non-dispatchable handles. If there is no __definedName__ it will be replaced by the empty string Actions:

2) All Enums will be stringified. This is inconsistently Any place that isn't should be fixed/filed as a bug (good place for an easy PR contribution(s).)

3) All Bitfields/Bitmasks will be stringified in the format

(A_BIT | B_BIT | .. | (1<<N))

using the latter form only if a given bit has not been recognized.

4) Implement Marker Tagging of Deferred Validation (prototype present in image layout validation) Correlating deferred validation to recorded command a common concern. The feature can be scoped by restricting the captured “marker state” to markers set within the currently recorded command buffer. Even reporting only the last marker set prior to a command with deferred validation was seen to have value.

5) Unique message (and log_msg output) per condition The presence of the same message indicating one of several possible issues was highlighted in feedback. User workaround required using the debugger to inspect the validation code to assess the issue reported. Messages should unambiguously identify a single issue. Potentially the “UNASSIGNED-“ may need to become less general (and get some defined “full text” to back them up). Additionally UNASSIGNED need to move into the Vulkan spec

6) Message format changes: Remove full text (as redundant and verbose) and link (which is potentially broken) -- dependent upon... (as separate issues) -- @mikes-lunarg

mark-lunarg commented 5 years ago

We should absolutely remove the spec-text from the error messages.

mark-lunarg commented 5 years ago

Please do not miss the concerns in issue #80.

krOoze commented 5 years ago

@mark-lunarg I would disagree with removing spec quote from the messages. Maybe could just be made optional via layer settings, for people that think it is more convenient to externally open the spec.

Anyway, I think the "duplication" is not a problem of the spec text being there. It is that the hand-written part duplicates it for no reason. The hand-written part should instead say what is the wrong usage in the specific case (meanwhile the spec text says what is right usage (in general case)). The example from OP should instead say e.g.:

pCreateInfo->unnormalizedCoordinates is VK_TRUE, but pCreateInfo->maxLod (= 16.000000) is not zero. But Vulkan spec says: blah blah.

IMO that way the user immediatelly see that either his unnormalizedCoordinates or maxLod value is wrong. And why that is so.

mark-lunarg commented 4 years ago

[UPDATE] The spec text will not be removed from the messages. Plain-text VUIDs are now embedded in the spec links in the layers shipped in SDKs. If/when WG adds this feature, all links will be to specs with plain-text VUIDs.

Log messages have been scrubbed to ensure all objects are output using the FormatHandle() helper as well as being included in the DebugUtils pObjectslist in the callback. Name output is also handled consistently in these cases.

The log_msg() call has been replaced with some simpler interfaces (LogError(), LogWarning(), etc.), eliminating a good bit of noise and boilerplate.

mark-lunarg commented 4 years ago

Moving #88 to here! This is a possible future enhancement to add stack frames for deferred validation checks.

mark-lunarg commented 4 years ago

Moving Issue #466 here, which also included #1102. These are improvements to specific error messages.

mark-lunarg commented 4 years ago

Moving issue #1936 here, building on issues in #466 and #1102, requesting more state for last-setter of image layouts, for instance.

mark-lunarg commented 3 years ago

The items included in this tracking issue are now being tracked under the repo's Project page, here.