Add a portability layer

gfxstrand commented 6 years ago

Given the amount of flexibility that the Vulkan API allows, there are cases where completely valid API usage can work fine on one implementation yet lead to unexpected behavior on another. One example of this which has plagued the OpenGL ES world is mediump where a developer who uses mediump in their shader may actually get full 32-bit precision depending on the implementation. If they're accidentally using mediump for high-precision calculations, their shaders may work fine and then break badly when they move to a different device. These aren't cases where you can't throw a validation error or even a warning because you statically can't know if that's what the application really intended or not. The thing to do in this case would be to have a layer which inserts extra commands to enforce a "worst case" behavior. Here are some examples:

Wrap all mediump expressions in OpQuantizeToF16 instructions
Intentionally trash image contents whenever they may be undefined:
- After a vkCmdPipelineBarrier which translates the image out of VK_IMAGE_LAYOUT_UNDEFINED (or PREINITIALIZED if the image is tiled).
- At the top of the render pass for any attachments marked VK_ATTACHMENT_LOAD_OP_DONT_CARE.
- At the bottom of the render pass for any attachments marked VK_ATTACHMENT_STORE_OP_DONT_CARE.

krOoze commented 6 years ago

Wrap all mediump expressions in OpQuantizeToF16 instructions

Sounds like something that should be part of the VK_LAYER_LUNARG_device_simulation layer.

Intentionally trash image contents whenever they may be undefined:

Sounds like case for WARNING (where possible). Otherwise yea; fill with some error pattern.

mikew-lunarg commented 6 years ago

Hi @krOoze Nope, not DevSim, something else. The device simulation layer's mission is to modify the results of Vulkan queries based on a JSON configuration file; it does not attempt to emulate device execution behavior. See SIGGRAPH 2017 BOF presentation page 82: DevSim simulates, but doesn't enforce nor emulate.

This could be raised w the Portability TSG, but I suspect this usecase should be given a more specific name than "portability".

krOoze commented 6 years ago

@mikew-lunarg Hmm, so something like "worst_case_precision" layer.

kvark commented 6 years ago

@jekstrand this would be great to have! I'd just want this to not collide with Vulkan Portability initiative (which is also developing a layer). Perhaps, "Vulkan compatibility layer" would do?

gfxstrand commented 6 years ago

I'm happy to leave the naming discussion to other more interested people. :-)

HansKristian-Work commented 6 years ago

I'll try to enumerate various instances of what I'd call "undefined results" which some implementations will implicitly define.

HansKristian-Work commented 6 years ago

So, I think the issues we want to concern ourselves with are issues where developers can rely on an implementation making unspecified results accidentally well-defined. Once you try on other platforms you didn't think of, the app breaks.

Essentially we want a layer kinda like VK_LAYER_hard_mode.

RelaxedPrecision / mediump. If you use mediump and only test on desktop, the result might be inaccurate when porting to mobile.
VK_ATTACHMENT_LOAD_OP_DONT_CARE, STORE_OP_DONT_CARE: The implementation is free to turn the image into garbage here, but some implementations may simply behave as-if LOAD_OP_LOAD was used. An application using DONT_CARE by mistake can easily run just fine.
Transitions from VK_IMAGE_LAYOUT_UNDEFINED to anything. The implementation is free to turn the image into garbage here, but implementations which do not concern themselves with image layouts might preserve the image. I have had bugs in the past where I transitions from UNDEFINED but I actually meant to preserve the image, and didn't see it until running on AMD.
pPreserveAttachments in multipass. If you don't use an attachment in a subpass, the implementation is free to thrash the attachment unless pPreserveAttachments is used. An implementation may preserve anyways which could lead to some awkward debugging sessions on other hardware.
Binding subpass input attachments. Currently, a shader will declare input_attachment_index as well as set/binding. For implementations which do not need an actual texture bound to a descriptor set to sample from subpassInput, it's easy to forget, and code can happily run. The wrong image can also be bound, which could cause some weird mismatch. This is a case where code can run "just fine" on mobile, but not desktop. I'm not sure if this is a validation error or not.
Aliasing VkMemory and memory corruption. Currently if you have aliased optimal images or aliased buffers and optimal images, if you modify one alias, all other aliases are trashed unless they are all in host-visible LINEAR layouts. On some implementations, it might work "just fine" to alias images, but not others.
Initial values of VkMemory. Sometimes the memory can just be all zero when allocating it, and some implementations may come to rely on behavior like this. The spec does mention there might be requirements here, so this could be moot if all relevant OSes supporting Vulkan requires this.
```
For instance, if an operating system guarantees that data in all its memory
allocations are set to zero when newly allocated, the Vulkan implementation must
make the same guarantees for any allocations it controls (e.g. VkDeviceMemory).
```
VkDescriptorPool might not actually be a strict pool and you can keep allocating from it indefinitely.
Command buffer pool behavior is interesting. Some apps may rely on vkFreeCommandBuffers reclaiming memory even when not using COMMAND_BUFFER_RESET_BIT. This can happen to work on implementations which do not pool command buffer allocation.
More tricky ones like forgetting vkFlushMappedMemoryRanges and vkInvalidateMappedMemoryRange on incoherent memory ranges can "happen" to work, although this is more of a CPU thing. GPU writes can also "happen" to become visible to the CPU without a pipeline barrier using HOST_WRITE_BIT.

nsubtil commented 6 years ago

We discussed this again and there is consensus that there is value in having this functionality accessible, but there are a few related problems as well (e.g., we should maybe have a "debug" and an "assistant" layer, instead of a multiple layers that make up a debug environment, to make it easy for developers to know exactly what to enable). We're at the point where we're working through those and figuring out if we have a good place to house this functionality.

There is related work underway already, in the form of an "assistant" layer. @KarenGhavam-lunarG is going to discuss this (along with #7) internally to see if it fits in the proposed assistant layer, and will get back to us with updates.

KarenGhavam-lunarG commented 6 years ago

LunarG had some internal discussions about this issue as well as issue #7. Our recommendation is the following:

The Assistant layer is to do best practice checks that are generally applicable to all Vulkan applications.
- You could extend the Assistant Layer to do these portability checks (or create a portability layer to do these checks). I added the example checks that @HansKristian-ARM has provided in this issue to the Assistant Layer tracking issue #1612.
- LunarG has resources applied in this area and intends to continue to enhance the Assistant Layer.
- Tracking of progress will be kept up to date in the Assistant Layer tracking issue #1612
Best practice/performance checks that are device specific should be done by IHV specific layers.
- It would be great if the IHVs could use the VLF (Vulkan Layer Factory) to create such layers.
- ARM has a device specific layer it may help motivate other IHVs to contribute layers if the current ARM layer is ported to use the VLF. The porting activity to the VLF could be used to demonstrate the value of the VLF tool to make layer writing easy. LunarG may have upcoming cycles to port this layer to the VLF and may consider doing so.
- LunarG is certainly willing to provide consultation and assistance to an IHV who would like to use the VLF to create an IHV specific best practices layer.

So perhaps we should close out issue #7 and issue #11 since they are being tracked in theAssistant Layer tracking issue #1612? And then we create a new issue focused on IHV/vendor specific best practices/performance checks. By having this new focused issue, perhaps the "Ecosystem forum team" can reach out to the IHVs to help influence them to create such layers?

natduca commented 6 years ago

I like the idea of more things consolidating into an assistant framework.

However, I'm not sure there's a lot of merit to splitting assistant and ihv cases yet... in the long run, possibly. But, are we splitting prematurely? If we do it now, then how do we make sure that each vendor's IHV-specific layer are cohesive? In this world, you end up with N layers that each have their own control scheme, output formats, configuration systems, internal architecture, etc. This makes the experience look quite fragmented to devs wont it?

karl-lunarg commented 6 years ago

You can think of the concept of the current assistant layer as being the device-independent "good for all Vulkans" layer.

This assistant layer is built on the VLF and we're suggesting that any IHV-specific layers should also be built on the VLF and "model" themselves after the existing assistant layer. We can add additional conventions if needed, but this should help address the problem with each layer having "their own control scheme, output formats, configuration systems, internal architecture, etc."

From the user's perspective, the most fragmentation they would see is a list of layers they are offered to enable. We can go with some useful defaults such as whenever the common assistant layer is loaded, the IHV-specific layer appropriate for the current device could also be loaded. And of course, the user would be able to override this with the existing settings file capabilities and/or explicitly loading the desired layers.

HansKristian-Work commented 6 years ago

I am planning to test run VLF for a layer idea I have fairly soon, based on that I might have a look at porting PerfDoc to it if it can help adoption, however, merging it into the LunarG repo (if that was the idea) seems premature.

If all IHVs can do something similar, a JSON meta-layer could also help ... But I don't think we're at that stage yet.

marty-johnson59 commented 3 years ago

This repository is being archived as it has been replaced with the vulkan.org website (https://www.vulkan.org) and is no longer being maintained (i.e., issues posted here are no longer being addressed ). After reviewing issues posted here, most (if not all) have been resolved or have already been re-opened in Vulkan-Docs (https://github.com/KhronosGroup/Vulkan-Docs) or other repositories for further consideration. Therefore, all issues in this repository will be closed. If you believe your issue has not yet been resolved, please re-open in Vulkan-Docs. Thanks!

KhronosGroup / Vulkan-Ecosystem

Add a portability layer #11