Open oscarbg opened 5 years ago
+1 for VK_KHR_maintenance1
The viewport y-flipping in particular is useful for D3D-based games. For what it's worth, Dota 2 also relies on the vkAllocateDescriptorSets behavior in VK_KHR_maintenace1. Currently there are a lot of errors printed in dota on MoltenVK because we are relying on vkAllocateDescriptorSets to fail - it works over MoltenVK, but with maintenance1 enabled those should not be errors.
@danginsburg & @oscarbg
Understood about VK_KHR_maintenance1
. It was also requested in enhancement #12. Should we raise the priority of that enhancement?
MoltenVK already supports negative viewport height via VK_AMD_negative_viewport_height
.
I'm closing this as a duplicate of enhancement #12.
Hi, I just saw: https://twitter.com/Plagman2/status/1032532014442852353 "vkd3d can now run the World of Warcraft: Battle for Azeroth DX12 renderer on top of Vulkan! Here's a few screenshots of it running on Wine, thanks to the hard work of the graphics guys over at CodeWeavers, Józef and Henri." so VKD3D keeps improving fast.. posting here as would be nice once VKD3D+MoltenVK compatibility is being investigated/implemented to test that "World of Warcraft: Battle for Azeroth DX12 renderer" works! jost posted the same request on gfx-rs portability effort: https://github.com/gfx-rs/portability/issues/136
Hi @billhollings,
just tested a master VKD3D build with latest MoltenVK on a simple D3D12 sample..
fails with:
36:err:vkd3d_init_device_caps: Timeline semaphores are not supported by this implementation. This is required for correct operation.
searching seems to indicate that for a while VKD3D has been requiring VK1.1+VK_KHR_timeline_semaphore etc: https://github.com/HansKristian-Work/vkd3d-proton/issues/140#issuecomment-644146156
so would like to hear, from @cdavis5e also, if possible, how difficult is to support VK_KHR_timeline_semaphore under MoltenVK.. as it seems the next step towards supporting VKD3D.. maybe better open a separate issue asking for VK_KHR_timeline_semaphore? thanks..
Hi @billhollings @cdavis5e , vkd3d-proton 2.0 just got released, and with it, requeriments were clarified:
Vulkan 1.1
VK_EXT_descriptor_indexing with at least 1000000 UpdateAfterBind descriptors for all types except UniformBuffer.
Essentially all features in VkPhysicalDeviceDescriptorIndexingFeatures must be supported.
VK_KHR_timeline_semaphore
Some notable extensions that should be supported for optimal or correct behavior.
These extensions will likely become mandatory later.
VK_EXT_robustness2
VK_KHR_buffer_device_address
VK_EXT_extended_dynamic_state
from their notes, theoretically MoltenVK is OK with current requeriments (as long as VK_EXT_descriptor_indexing minor details can be sorted out).. but seems for "optimal or correct" behaviour (and pointed out could be requeriments later) also: VK_KHR_buffer_device_address VK_EXT_extended_dynamic_state are needed which are currently unsupported by MoltenVK.. curious if the two extensions can be implemented under Metal efficiently.. seems also VK_KHR_buffer_device_address is a requeriment for: https://github.com/KhronosGroup/MoltenVK/issues/427
AFAIK, MoltenVK doesn't have VK_EXT_robustness2
, or even the robustBufferAccess
feature supported.
AFAIK, MoltenVK doesn't have
VK_EXT_robustness2
We do, but we only support robustImageAccess2
. I've asked Apple for the other features (FB7792238, FB7792288).
What I don't get is why VK_EXT_descriptor_indexing
with 1M resources is required. It should be possible to support D3D12_RESOURCE_BINDING_TIER_1
without it. Also, we can't support 1,000,000 resources, even with argument buffers: Metal only guarantees 500k. I've asked Apple for that, too (FB8877050).
VK_KHR_buffer_device_address
- I've repeatedly begged Apple for this (FB6908280), but they've balked at it so far.
VK_EXT_extended_dynamic_state
- We can't implement this right now, because Metal does not yet support dynamic vertex buffer stride. All the other extended dynamic states are supported by Metal. I have asked Apple for this (FB7648364).
@cdavis5e thanks as always for your detailed information, much appreciated! altough sad to hear about current Metal limitations.. gone ahead and asked @gavkar on twitter to help prioritize this requests.. https://twitter.com/oscarbg81/status/1326490958691770369 let's see.. hope he can help in some way to resolve these issues faster..
Since this is an active discussion, as well as Apple wish-list, I'm reopening this issue for visibility.
What I don't get is why VK_EXT_descriptor_indexing with 1M resources is required. It should be possible to support
D3D12_RESOURCE_BINDING_TIER_1
without it.
TIER_1
support is not very useful, current games tend to ask for TIER_2
or higher. We removed the fallback path that would repack descriptors at draw time or queue submission time because it required us to reference-count Vulkan views and synchronize access to D3D12 descriptors, which made descriptor copies several orders of magnitude slower than on native drivers and the primary bottleneck in most games, to the point where performance would be unusably low.
We're aware that Metal cannot support the D3D12 binding model at the moment; it's hard enough to make it work properly on native Vulkan drivers especially if we want to get useful performance out of it. It is not a priority for vkd3d-proton, although the original vkd3d project should still work as it does not use descriptor indexing (but also doesn't support TIER_2
or higher properly).
just pointing VKD3D 2.1 was released few weeks ago adding support for VK_VALVE_mutable_descriptor_type extension. according to VKD3D devs that ext. "is also highly recommended, but not mandatory." RN mention:
Add support for VK_VALVE_mutable_descriptor_type which improves CPU overhead, memory bloat, and avoids potential memory management thrashing on RADV. Also avoids GPU hangs in certain situations where games misuse the D3D12 API.
hope MoltenVK support for this extension can be implemented efficiently via Metal eventually.. @cdavis5e any comments?
No it can't. Metal can't even support descriptor indexing properly because on top of having to specify the descriptor type as in sampled image, storage buffer,... you also have to specify the texture dimensions (2D, 3D) and whether or not it's multisampled.
And even if that wasn't an issue, the highest limit for argument buffers (500k) is not good enough for D3D12 which requires 1 million for tier 3 and many games rely on that.
@cdavis5e any comments?
We already have to build MTLArgumentEncoder
s on the fly, because, as @K0bin points out...
Metal can't even support descriptor indexing properly because on top of having to specify the descriptor type as in sampled image, storage buffer,... you also have to specify the texture dimensions (2D, 3D) and whether or not it's multisampled.
I've asked Apple to support texture arguments where the texture type is unknown (FB7069952), and also for an argument type that can be either a texture of any type or a buffer pointer (FB7648288). In addition, in light of @K0bin's other comment:
And even if that wasn't an issue, the highest limit for argument buffers (500k) is not good enough for D3D12 which requires 1 million for tier 3 and many games rely on that.
I've asked them to raise the limits (FB8877050). (Note that tier 3 requires 1 million of each kind of resource--CBV, SRV, UAV. That's 3 million in total. It's tier 2 that only requires a million SRVs. I have accounted for this in my request.)
I guess we'll just have to wait and see.
Should the VKD3D discussion continue here in the bug, or move to the shiny new discussion forum?
just doing a quick update as some new extensions are now required:
VK_KHR_separate_depth_stencil_layouts (part of VK1.2 so part of: https://github.com/KhronosGroup/MoltenVK/issues/1567) VK_KHR_copy_commands2 VK_KHR_dynamic_rendering (requested independently https://github.com/KhronosGroup/MoltenVK/issues/1536) VK_EXT_extended_dynamic_state2
EDIT: running on Wine 7.5 + VKD3D 2.6 and MoltenVK 1.1.9
@oscarbg upstream wine merged VKD3D-1.3 and builds as PE from wine-7.4 see https://github.com/wine-mirror/wine/commit/97db56ab6146733141b3afc90252c061b1ec37dd
The VKD3D 2.6 version makes me think your attempting to use VKD3D-proton that’s much less lightly to work.
@Gcenx yes I know thanks.. just want also MoltenVK to keep adding exts for eventual VKD3D-proton support.. as it's more advanced in D3D12 support..
@cdavis5e @billhollings just have a MoltenVK request/question related to VKD3D support.. support for D3D12 ExecuteIndirect on VKD3D-Proton has been difficult to achieve.. in fact isn't even merged on master.. see the merge request and branch: https://github.com/HansKristian-Work/vkd3d-proton/pull/996 https://github.com/HansKristian-Work/vkd3d-proton/commits/execute-indirect-advanced-breadcrumbs the current motivation for VKD3D, is allowing running Halo Infinite, which makes unavoidable support for ExecuteIndirect seems.. sadly, right now seems it needs an NV VK ext to support that: VK_NV_device_generated_commands as no EXT or KHR VK exts currently expose the needed features.. at least there has been an effort by RADV (AMD VK drv) dev to expose a subset of VK_NV_device_generated_commands for VKD3D ExecuteInidirect support it seems: https://www.basnieuwenhuizen.nl/a-driver-on-the-gpu/ this post shows the needed subset:
ExecuteIndirect can be seen as an extension of what we have in Vulkan with vkCmdDrawIndirectCount. It adds extra capabilities. To support that with vkd3d-proton we need the following indirect Vulkan capabilities:
Binding vertex buffers. Binding index buffers. Updating push constants.
just asking here if it's possible to support in MoltenVK this subset.. i.e. if Metal supports this three additional indirect capabilities.. if not it may be good to request to Apple.. maybe better to open a separate MoltenVK issue to track this support independently? thanks..
Metal does have similar functionality with indirect command buffers but D3D12 uses GPU VAs to set vertex/index buffers and Metal does not support directly using VAs.
I’d make VKD3D-Proton it’s own issue since they love to change requirements often and push for new extensions to meet there needs, that’s ignoring the common practice of dropping support for older GPUs.
thanks @K0bin for info.. with Crossover in 2023 looking for D3D12 support on Macos hope someone is feeding Apple with requests like these for Metal API to match D3D12 better.. @Gcenx yep there a two kinds of requirements (VK EXTs) on VKD3D-proton, ones ares needed for exposing D3D12 features like the last I asked for ExecuteIndirect.. these ones are going to be unavoidable for both VKD3D and VKD3D-Proton.. there other ones are an implementation choice (cleaner implementation, etc..).. these are the ones where VKD3D and VKD3D-proton differ which VKD3D primarily looking at needed less extensions probably due to not being supported on MoltenVK yet.. that's my view of the matter..
In case you haven't heard and are interested, at today's WWDC event apple announced Metal 3 which seems to support buffer device address, so it should be possible to implement VK_KHR_buffer_device_address
. The last function in the Buffer
class looks promising 👍.
class Buffer : public NS::Referencing<Buffer, Resource>
{
public:
NS::UInteger length() const;
void* contents();
void didModifyRange(NS::Range range);
class Texture* newTexture(const class TextureDescriptor* descriptor, NS::UInteger offset, NS::UInteger bytesPerRow);
void addDebugMarker(const NS::String* marker, NS::Range range);
void removeAllDebugMarkers();
class Buffer* remoteStorageBuffer() const;
class Buffer* newRemoteBufferViewForDevice(const class Device* device);
uint64_t gpuAddress() const;
}
Metal 3...it should be possible to implement
VK_KHR_buffer_device_address
Consolidating in discuss #1616.
I don't think it's feasible to implement D3D12 on Metal.
D3D12 is bindless, so MoltenVK would have to use Argument Buffers to implement the binding model.
The bigger problem is the way Metal handles barriers in that case. For all resources that can be written to, you have to specify whether those are read only or can be written with useResource
and that has to be done in every single render/compute pass. D3D12 descriptor heaps can have up to 1 million entries, so firstly, it would incur a huge CPU overhead cost. The bigger problem is that you simply do not know whether a resource will be written to or not. The current implementation of VK_EXT_descriptor_indexing
decides based on Vulkan descriptor but you'd very often have both a VK_DESCRIPTOR_TYPE_SAMPLED_IMAGE
and a VK_DESCRIPTOR_TYPE_STORAGE_IMAGE
(for example) descriptor for the same resource. And declaring all those resources as read/write in every single pass would probably come with a big GPU performance cost because you'd essentially have to do a full barrier after every single render/compute pass. Equivalent Vulkan barriers might have happened in earlier command buffers, so it's not possible to reliably track the image layout either.
Is this still a problem with Metal 3?
During WWDC they covered how “to make your app's bindless journey a joy by simplifying argument buffers, allocating acceleration structures from heaps, and benefitting from the improvements to the Metal's validation layer and Debugger Tools.“ - https://developer.apple.com/videos/play/wwdc2022/10101/
Yes, all of this is still a problem with Metal 3.
The bigger problem is the way Metal handles barriers in that case. For all resources that can be written to, you have to specify whether those are read only or can be written with
useResource
and that has to be done in every single render/compute pass.
It isn't just for barriers. Those methods are a signal to Metal that it needs to keep those resources' memory paged in for the duration of the command encoder. In D3D12, pageability is controlled explicitly by the application (ID3D12Device::Evict()
and ID3D12Device::MakeResident()
). I think we need that level of control to support D3D12 properly. We need CW to push for this with Apple. (@js6i @stefand: Hint, hint.)
Also, you only really need to do this if you're using MTLResourceHazardTrackingModeUntracked
, which we aren't yet, but that's on my todo list.
D3D12 descriptor heaps can have up to 1 million entries, so firstly, it would incur a huge CPU overhead cost.
What they want you to do in that case is use MTLHeap
objects and tell it to page the heap as a whole in with -[MTL*CommandEncoder useHeap:usage:stages:]
. But then, you can't write to them. Or use the array method -[MTL*CommandEncoder useResource*s*:usage:stages:]
.
It isn't just for barriers. Those methods are a signal to Metal that it needs to keep those resources' memory paged in for the duration of the command encoder
Yeah, it conflates residency with cache flushing/image layouts. Given that Vulkan has no explicit concept of residency, you'd have to maintain a global list of all resources or (like MoltenVK does right now) use all resources in a descriptor set.
Also, you only really need to do this if you're using MTLResourceHazardTrackingModeUntracked, which we aren't yet, but that's on my todo list.
As far as I understand it, it's necessary for argument buffers too because the API has no way of knowing which resources you're gonna access.
Heaps do indeed cut down on the number of residency API calls but still leave the problem that all writable resources need useResource
. That's still the big problem IMO.
Ideally Metal would need a barrier function that transitions resources and keeps them in that state, so you wouldn't have to call useResource
at all and just make your heaps resident.
Wouldn't something like a generation gc help in that? A readonly and a rw generation.
Resource lifetimes and how they are used is entirely controlled by the application and the driver (MoltenVK in this case) doesn't have enough information or enough wiggle room to do anything clever here.
Apple's documentation on this is unfortunately very incomplete, but are we sure that useResource
does synchronisation? It is my understanding that if combined with untracked resources this is all about residency (the GPU seems to crash on page faults). And if I remember correctly, an Apple engineer might have mentioned during an WWDC video that useResource
on writeable textures is required to ensure that the texture is not compressed or otherwise layout-optimised when it's being written to. It is still the responsibility of the user to insert appropriate barriers or fences to ensure that the data is properly synchronised.
That said, I can imagine that there might be a feasible way of approaching this problem (with the caveat that I am not familiar with MoltenVK codebase):
MTLHeap
VK_DESCRIPTOR_TYPE_STORAGE_IMAGE
bit in a separate semi-linear data structure that allows fast updates (e.g. a sorted skip list)useResources
for all the currently existing textures with storage bit set if the pipeline writes to a texture Is there something obvious I am missing (aside the non-trivial performance overhead obviously?)
BTW, is it possible that setting allowGPUOptimizedContents
to false
(or some other options) might make useResource
on writeable textures unnecessary?
I don't know if it does synchronization but given that it seems to do cache flushing/invalidation and/or image layout transitions, it probably needs to be synchronized externally if it doesn't do it on its own.
From "Explore bindless rendering in Metal"
Additionally, any resources that may have been modified that we now intend to read from will still need their own useResource call.
This is so that the Metal framework can handle resource transitions for you, flushing GPU caches and adjusting the internal memory layout.
I don't think your proposed approach will work because:
Track whether a shader stage writes to a texture for each existing pipeline
This is basically impossible with full UPDATE_AFTER_BIND and descriptor indexing. The app could calculate the final index for the storage image descriptor in a million different ways.
BTW, is it possible that setting allowGPUOptimizedContents to false (or some other options) might make useResource on writeable textures unnecessary?
useResource still seems to handle cache flushing/invalidation, so this probably doesn't work.
Hi, there are patches for VKD3D, adding MacOS Support... https://source.winehq.org/patches/data/149222
I tested triangle and gears demos and they work although they print:
err:vkd3d_check_extensions: Required device extension "VK_KHR_maintenance1" is not supported. err:vkd3d_check_extensions: Required device extension "VK_KHR_shader_draw_parameters" is not supported. [MoltenVK ERROR] VK_ERROR_EXTENSION_NOT_PRESENT: Vulkan extension VK_KHR_maintenance1 is not supported. [MoltenVK ERROR] VK_ERROR_EXTENSION_NOT_PRESENT: Vulkan extension VK_KHR_shader_draw_parameters is not supported.
interestingly DXVK also needs VK_KHR_maintenance1 and VK_KHR_shader_draw_parameters but far less than DXVK requires.. regarding optional features must dig a little more.. will add later as I find it..