KhronosGroup / MoltenVK

MoltenVK is a Vulkan Portability implementation. It layers a subset of the high-performance, industry-standard Vulkan graphics and compute API over Apple's Metal graphics framework, enabling Vulkan applications to run on macOS, iOS and tvOS.
Apache License 2.0
4.66k stars 405 forks source link

VKD3D Vulkan requirements.. #214

Open oscarbg opened 5 years ago

oscarbg commented 5 years ago

Hi, there are patches for VKD3D, adding MacOS Support... https://source.winehq.org/patches/data/149222

I tested triangle and gears demos and they work although they print:

err:vkd3d_check_extensions: Required device extension "VK_KHR_maintenance1" is not supported. err:vkd3d_check_extensions: Required device extension "VK_KHR_shader_draw_parameters" is not supported. [MoltenVK ERROR] VK_ERROR_EXTENSION_NOT_PRESENT: Vulkan extension VK_KHR_maintenance1 is not supported. [MoltenVK ERROR] VK_ERROR_EXTENSION_NOT_PRESENT: Vulkan extension VK_KHR_shader_draw_parameters is not supported.

interestingly DXVK also needs VK_KHR_maintenance1 and VK_KHR_shader_draw_parameters but far less than DXVK requires.. regarding optional features must dig a little more.. will add later as I find it..

danginsburg commented 5 years ago

+1 for VK_KHR_maintenance1

The viewport y-flipping in particular is useful for D3D-based games. For what it's worth, Dota 2 also relies on the vkAllocateDescriptorSets behavior in VK_KHR_maintenace1. Currently there are a lot of errors printed in dota on MoltenVK because we are relying on vkAllocateDescriptorSets to fail - it works over MoltenVK, but with maintenance1 enabled those should not be errors.

billhollings commented 5 years ago

@danginsburg & @oscarbg

Understood about VK_KHR_maintenance1. It was also requested in enhancement #12. Should we raise the priority of that enhancement?

MoltenVK already supports negative viewport height via VK_AMD_negative_viewport_height.

billhollings commented 5 years ago

I'm closing this as a duplicate of enhancement #12.

oscarbg commented 5 years ago

Hi, I just saw: https://twitter.com/Plagman2/status/1032532014442852353 "vkd3d can now run the World of Warcraft: Battle for Azeroth DX12 renderer on top of Vulkan! Here's a few screenshots of it running on Wine, thanks to the hard work of the graphics guys over at CodeWeavers, Józef and Henri." so VKD3D keeps improving fast.. posting here as would be nice once VKD3D+MoltenVK compatibility is being investigated/implemented to test that "World of Warcraft: Battle for Azeroth DX12 renderer" works! jost posted the same request on gfx-rs portability effort: https://github.com/gfx-rs/portability/issues/136

oscarbg commented 3 years ago

Hi @billhollings,

just tested a master VKD3D build with latest MoltenVK on a simple D3D12 sample.. fails with: 36:err:vkd3d_init_device_caps: Timeline semaphores are not supported by this implementation. This is required for correct operation. searching seems to indicate that for a while VKD3D has been requiring VK1.1+VK_KHR_timeline_semaphore etc: https://github.com/HansKristian-Work/vkd3d-proton/issues/140#issuecomment-644146156

so would like to hear, from @cdavis5e also, if possible, how difficult is to support VK_KHR_timeline_semaphore under MoltenVK.. as it seems the next step towards supporting VKD3D.. maybe better open a separate issue asking for VK_KHR_timeline_semaphore? thanks..

oscarbg commented 3 years ago

Hi @billhollings @cdavis5e , vkd3d-proton 2.0 just got released, and with it, requeriments were clarified:

Vulkan 1.1
VK_EXT_descriptor_indexing with at least 1000000 UpdateAfterBind descriptors for all types except UniformBuffer. 
Essentially all features in VkPhysicalDeviceDescriptorIndexingFeatures must be supported.
VK_KHR_timeline_semaphore

Some notable extensions that should be supported for optimal or correct behavior.
These extensions will likely become mandatory later.

VK_EXT_robustness2
VK_KHR_buffer_device_address
VK_EXT_extended_dynamic_state

from their notes, theoretically MoltenVK is OK with current requeriments (as long as VK_EXT_descriptor_indexing minor details can be sorted out).. but seems for "optimal or correct" behaviour (and pointed out could be requeriments later) also: VK_KHR_buffer_device_address VK_EXT_extended_dynamic_state are needed which are currently unsupported by MoltenVK.. curious if the two extensions can be implemented under Metal efficiently.. seems also VK_KHR_buffer_device_address is a requeriment for: https://github.com/KhronosGroup/MoltenVK/issues/427

kvark commented 3 years ago

AFAIK, MoltenVK doesn't have VK_EXT_robustness2, or even the robustBufferAccess feature supported.

cdavis5e commented 3 years ago

AFAIK, MoltenVK doesn't have VK_EXT_robustness2

We do, but we only support robustImageAccess2. I've asked Apple for the other features (FB7792238, FB7792288).

cdavis5e commented 3 years ago

What I don't get is why VK_EXT_descriptor_indexing with 1M resources is required. It should be possible to support D3D12_RESOURCE_BINDING_TIER_1 without it. Also, we can't support 1,000,000 resources, even with argument buffers: Metal only guarantees 500k. I've asked Apple for that, too (FB8877050).

VK_KHR_buffer_device_address - I've repeatedly begged Apple for this (FB6908280), but they've balked at it so far.

VK_EXT_extended_dynamic_state - We can't implement this right now, because Metal does not yet support dynamic vertex buffer stride. All the other extended dynamic states are supported by Metal. I have asked Apple for this (FB7648364).

oscarbg commented 3 years ago

@cdavis5e thanks as always for your detailed information, much appreciated! altough sad to hear about current Metal limitations.. gone ahead and asked @gavkar on twitter to help prioritize this requests.. https://twitter.com/oscarbg81/status/1326490958691770369 let's see.. hope he can help in some way to resolve these issues faster..

billhollings commented 3 years ago

Since this is an active discussion, as well as Apple wish-list, I'm reopening this issue for visibility.

doitsujin commented 3 years ago

What I don't get is why VK_EXT_descriptor_indexing with 1M resources is required. It should be possible to support D3D12_RESOURCE_BINDING_TIER_1 without it.

TIER_1 support is not very useful, current games tend to ask for TIER_2 or higher. We removed the fallback path that would repack descriptors at draw time or queue submission time because it required us to reference-count Vulkan views and synchronize access to D3D12 descriptors, which made descriptor copies several orders of magnitude slower than on native drivers and the primary bottleneck in most games, to the point where performance would be unusably low.

We're aware that Metal cannot support the D3D12 binding model at the moment; it's hard enough to make it work properly on native Vulkan drivers especially if we want to get useful performance out of it. It is not a priority for vkd3d-proton, although the original vkd3d project should still work as it does not use descriptor indexing (but also doesn't support TIER_2 or higher properly).

oscarbg commented 3 years ago

just pointing VKD3D 2.1 was released few weeks ago adding support for VK_VALVE_mutable_descriptor_type extension. according to VKD3D devs that ext. "is also highly recommended, but not mandatory." RN mention:

Add support for VK_VALVE_mutable_descriptor_type which improves CPU overhead, memory bloat, and avoids potential memory management thrashing on RADV. Also avoids GPU hangs in certain situations where games misuse the D3D12 API.

hope MoltenVK support for this extension can be implemented efficiently via Metal eventually.. @cdavis5e any comments?

K0bin commented 3 years ago

No it can't. Metal can't even support descriptor indexing properly because on top of having to specify the descriptor type as in sampled image, storage buffer,... you also have to specify the texture dimensions (2D, 3D) and whether or not it's multisampled.

And even if that wasn't an issue, the highest limit for argument buffers (500k) is not good enough for D3D12 which requires 1 million for tier 3 and many games rely on that.

cdavis5e commented 3 years ago

@cdavis5e any comments?

We already have to build MTLArgumentEncoders on the fly, because, as @K0bin points out...

Metal can't even support descriptor indexing properly because on top of having to specify the descriptor type as in sampled image, storage buffer,... you also have to specify the texture dimensions (2D, 3D) and whether or not it's multisampled.

I've asked Apple to support texture arguments where the texture type is unknown (FB7069952), and also for an argument type that can be either a texture of any type or a buffer pointer (FB7648288). In addition, in light of @K0bin's other comment:

And even if that wasn't an issue, the highest limit for argument buffers (500k) is not good enough for D3D12 which requires 1 million for tier 3 and many games rely on that.

I've asked them to raise the limits (FB8877050). (Note that tier 3 requires 1 million of each kind of resource--CBV, SRV, UAV. That's 3 million in total. It's tier 2 that only requires a million SRVs. I have accounted for this in my request.)

I guess we'll just have to wait and see.

ovvldc commented 3 years ago

Should the VKD3D discussion continue here in the bug, or move to the shiny new discussion forum?

oscarbg commented 2 years ago

just doing a quick update as some new extensions are now required:

VK_KHR_separate_depth_stencil_layouts (part of VK1.2 so part of: https://github.com/KhronosGroup/MoltenVK/issues/1567) VK_KHR_copy_commands2 VK_KHR_dynamic_rendering (requested independently https://github.com/KhronosGroup/MoltenVK/issues/1536) VK_EXT_extended_dynamic_state2

EDIT: running on Wine 7.5 + VKD3D 2.6 and MoltenVK 1.1.9

Gcenx commented 2 years ago

@oscarbg upstream wine merged VKD3D-1.3 and builds as PE from wine-7.4 see https://github.com/wine-mirror/wine/commit/97db56ab6146733141b3afc90252c061b1ec37dd

The VKD3D 2.6 version makes me think your attempting to use VKD3D-proton that’s much less lightly to work.

oscarbg commented 2 years ago

@Gcenx yes I know thanks.. just want also MoltenVK to keep adding exts for eventual VKD3D-proton support.. as it's more advanced in D3D12 support..

oscarbg commented 2 years ago

@cdavis5e @billhollings just have a MoltenVK request/question related to VKD3D support.. support for D3D12 ExecuteIndirect on VKD3D-Proton has been difficult to achieve.. in fact isn't even merged on master.. see the merge request and branch: https://github.com/HansKristian-Work/vkd3d-proton/pull/996 https://github.com/HansKristian-Work/vkd3d-proton/commits/execute-indirect-advanced-breadcrumbs the current motivation for VKD3D, is allowing running Halo Infinite, which makes unavoidable support for ExecuteIndirect seems.. sadly, right now seems it needs an NV VK ext to support that: VK_NV_device_generated_commands as no EXT or KHR VK exts currently expose the needed features.. at least there has been an effort by RADV (AMD VK drv) dev to expose a subset of VK_NV_device_generated_commands for VKD3D ExecuteInidirect support it seems: https://www.basnieuwenhuizen.nl/a-driver-on-the-gpu/ this post shows the needed subset:

ExecuteIndirect can be seen as an extension of what we have in Vulkan with vkCmdDrawIndirectCount. It adds extra capabilities. To support that with vkd3d-proton we need the following indirect Vulkan capabilities:

Binding vertex buffers.
Binding index buffers.
Updating push constants.

just asking here if it's possible to support in MoltenVK this subset.. i.e. if Metal supports this three additional indirect capabilities.. if not it may be good to request to Apple.. maybe better to open a separate MoltenVK issue to track this support independently? thanks..

K0bin commented 2 years ago

Metal does have similar functionality with indirect command buffers but D3D12 uses GPU VAs to set vertex/index buffers and Metal does not support directly using VAs.

Gcenx commented 2 years ago

I’d make VKD3D-Proton it’s own issue since they love to change requirements often and push for new extensions to meet there needs, that’s ignoring the common practice of dropping support for older GPUs.

oscarbg commented 2 years ago

thanks @K0bin for info.. with Crossover in 2023 looking for D3D12 support on Macos hope someone is feeding Apple with requests like these for Metal API to match D3D12 better.. @Gcenx yep there a two kinds of requirements (VK EXTs) on VKD3D-proton, ones ares needed for exposing D3D12 features like the last I asked for ExecuteIndirect.. these ones are going to be unavoidable for both VKD3D and VKD3D-Proton.. there other ones are an implementation choice (cleaner implementation, etc..).. these are the ones where VKD3D and VKD3D-proton differ which VKD3D primarily looking at needed less extensions probably due to not being supported on MoltenVK yet.. that's my view of the matter..

BeastLe9enD commented 2 years ago

In case you haven't heard and are interested, at today's WWDC event apple announced Metal 3 which seems to support buffer device address, so it should be possible to implement VK_KHR_buffer_device_address. The last function in the Buffer class looks promising 👍.

class Buffer : public NS::Referencing<Buffer, Resource>
{
public:
    NS::UInteger   length() const;

    void*          contents();

    void           didModifyRange(NS::Range range);

    class Texture* newTexture(const class TextureDescriptor* descriptor, NS::UInteger offset, NS::UInteger bytesPerRow);

    void           addDebugMarker(const NS::String* marker, NS::Range range);

    void           removeAllDebugMarkers();

    class Buffer*  remoteStorageBuffer() const;

    class Buffer*  newRemoteBufferViewForDevice(const class Device* device);

    uint64_t       gpuAddress() const;
}
billhollings commented 2 years ago

Metal 3...it should be possible to implement VK_KHR_buffer_device_address

Consolidating in discuss #1616.

K0bin commented 1 year ago

I don't think it's feasible to implement D3D12 on Metal.

D3D12 is bindless, so MoltenVK would have to use Argument Buffers to implement the binding model. The bigger problem is the way Metal handles barriers in that case. For all resources that can be written to, you have to specify whether those are read only or can be written with useResource and that has to be done in every single render/compute pass. D3D12 descriptor heaps can have up to 1 million entries, so firstly, it would incur a huge CPU overhead cost. The bigger problem is that you simply do not know whether a resource will be written to or not. The current implementation of VK_EXT_descriptor_indexing decides based on Vulkan descriptor but you'd very often have both a VK_DESCRIPTOR_TYPE_SAMPLED_IMAGE and a VK_DESCRIPTOR_TYPE_STORAGE_IMAGE (for example) descriptor for the same resource. And declaring all those resources as read/write in every single pass would probably come with a big GPU performance cost because you'd essentially have to do a full barrier after every single render/compute pass. Equivalent Vulkan barriers might have happened in earlier command buffers, so it's not possible to reliably track the image layout either.

magnuslundstedt commented 1 year ago

Is this still a problem with Metal 3?

During WWDC they covered how “to make your app's bindless journey a joy by simplifying argument buffers, allocating acceleration structures from heaps, and benefitting from the improvements to the Metal's validation layer and Debugger Tools.“ - https://developer.apple.com/videos/play/wwdc2022/10101/

K0bin commented 1 year ago

Yes, all of this is still a problem with Metal 3.

cdavis5e commented 1 year ago

The bigger problem is the way Metal handles barriers in that case. For all resources that can be written to, you have to specify whether those are read only or can be written with useResource and that has to be done in every single render/compute pass.

It isn't just for barriers. Those methods are a signal to Metal that it needs to keep those resources' memory paged in for the duration of the command encoder. In D3D12, pageability is controlled explicitly by the application (ID3D12Device::Evict() and ID3D12Device::MakeResident()). I think we need that level of control to support D3D12 properly. We need CW to push for this with Apple. (@js6i @stefand: Hint, hint.)

Also, you only really need to do this if you're using MTLResourceHazardTrackingModeUntracked, which we aren't yet, but that's on my todo list.

D3D12 descriptor heaps can have up to 1 million entries, so firstly, it would incur a huge CPU overhead cost.

What they want you to do in that case is use MTLHeap objects and tell it to page the heap as a whole in with -[MTL*CommandEncoder useHeap:usage:stages:]. But then, you can't write to them. Or use the array method -[MTL*CommandEncoder useResource*s*:usage:stages:].

K0bin commented 1 year ago

It isn't just for barriers. Those methods are a signal to Metal that it needs to keep those resources' memory paged in for the duration of the command encoder

Yeah, it conflates residency with cache flushing/image layouts. Given that Vulkan has no explicit concept of residency, you'd have to maintain a global list of all resources or (like MoltenVK does right now) use all resources in a descriptor set.

Also, you only really need to do this if you're using MTLResourceHazardTrackingModeUntracked, which we aren't yet, but that's on my todo list.

As far as I understand it, it's necessary for argument buffers too because the API has no way of knowing which resources you're gonna access.

Heaps do indeed cut down on the number of residency API calls but still leave the problem that all writable resources need useResource. That's still the big problem IMO.

Ideally Metal would need a barrier function that transitions resources and keeps them in that state, so you wouldn't have to call useResource at all and just make your heaps resident.

rcaridade145 commented 1 year ago

Wouldn't something like a generation gc help in that? A readonly and a rw generation.

K0bin commented 1 year ago

Resource lifetimes and how they are used is entirely controlled by the application and the driver (MoltenVK in this case) doesn't have enough information or enough wiggle room to do anything clever here.

mr-mobster commented 1 year ago

Apple's documentation on this is unfortunately very incomplete, but are we sure that useResource does synchronisation? It is my understanding that if combined with untracked resources this is all about residency (the GPU seems to crash on page faults). And if I remember correctly, an Apple engineer might have mentioned during an WWDC video that useResource on writeable textures is required to ensure that the texture is not compressed or otherwise layout-optimised when it's being written to. It is still the responsibility of the user to insert appropriate barriers or fences to ensure that the data is properly synchronised.

That said, I can imagine that there might be a feasible way of approaching this problem (with the caveat that I am not familiar with MoltenVK codebase):

  1. Only ever allocate objects using untracked MTLHeap
  2. Maintain a list of textures with VK_DESCRIPTOR_TYPE_STORAGE_IMAGE bit in a separate semi-linear data structure that allows fast updates (e.g. a sorted skip list)
  3. Use indirect command buffers to encode commands (this more closely matches how Vulkan or DX12 work anyway)
  4. Track whether a shader stage writes to a texture for each existing pipeline
  5. Make all the heaps resident and call useResources for all the currently existing textures with storage bit set if the pipeline writes to a texture

Is there something obvious I am missing (aside the non-trivial performance overhead obviously?)

mr-mobster commented 1 year ago

BTW, is it possible that setting allowGPUOptimizedContents to false (or some other options) might make useResource on writeable textures unnecessary?

K0bin commented 1 year ago

I don't know if it does synchronization but given that it seems to do cache flushing/invalidation and/or image layout transitions, it probably needs to be synchronized externally if it doesn't do it on its own.

From "Explore bindless rendering in Metal"

Additionally, any resources that may have been modified that we now intend to read from will still need their own useResource call.

This is so that the Metal framework can handle resource transitions for you, flushing GPU caches and adjusting the internal memory layout.

I don't think your proposed approach will work because:

Track whether a shader stage writes to a texture for each existing pipeline

This is basically impossible with full UPDATE_AFTER_BIND and descriptor indexing. The app could calculate the final index for the storage image descriptor in a million different ways.

BTW, is it possible that setting allowGPUOptimizedContents to false (or some other options) might make useResource on writeable textures unnecessary?

useResource still seems to handle cache flushing/invalidation, so this probably doesn't work.