Cross-vendor version of `VK_NV_mesh_shader`

ghost commented 3 years ago

Not so long ago, in presentations, AMD announced support for mesh shaders in DirectX 12 Ultimate. So, will there be mesh shaders in the Vulkan API, and not just for NVIDIA? Will it be an extension or will it be Vulkan API 1.3?

TomOlson commented 3 years ago

@helixd-2k18, we're actively discussing where we want to go in terms of geometry pipeline enhancements for Vulkan. DX12-style task/mesh shaders are an option, but they don't map equally well onto all of the GPU architectures out there - there is risk of ending up with non-obvious performance cliffs, conflicting developer guidance from different GPU vendors, et cetera. We'd like to avoid that. On the other hand, we see the value of compatibility with DX and would like to make porting and emulation as easy as possible. Any information you can give us about your use cases, constraints, or feature priorities would help us make better decisions here, and would be very welcome.

To answer your second question - whatever we do in this space is likely to come out first in extension form. If we get good HW support and good feedback from developers, then we'll consider it for a future core version.

osor-io commented 3 years ago

Has there been any developments about an official extension for Mesh shaders? Sadly if you want to use them right now on the latest AMD and Nvidia cards you're forced to go DX12 😢 Which is extra sad cause this feature is likely to become quite relevant.

Something like VK_NV_mesh_shader -> VK_KHR_mesh_shader (with any needed tweaks) in a similar way that happened with raytracing would be nice. In terms of mapping to hardware, seems pretty similar to the raytracing case again, where it wouldn't make that much sense to be in the core for now as you mention.

georgeouzou commented 3 years ago

@TomOlson A very nice feature of VK_NV_mesh_shader is that we can differentiate between per primitive and per vertex output data from the mesh shader. I do not know if this has any performance difference as we used flat outputs for this case in the old pipeline.

BeastLe9enD commented 2 years ago

Are there any news about the cross-vendor extension, because unfortunately there is currently no way to use mesh shaders on AMD hardware with vulkan

AceThither commented 2 years ago

@helixd-2k18 @TomOlson @osor-io @georgeouzou @BeastLe9enD

FYI, today, the Mesa folks merged the following:

radv: Experimental support for Mesh Shaders.

They mentioned some notes regarding NV_mesh_shader, quoted below for visibility:

Notes about NV_mesh_shader

Important note: NV_mesh_shader will never be officially supported on RADV, because it performs poorly on AMD hardware. However, we are implementing this extension to get some experience with mesh shader technology. Users should not rely on this support because we are going to remove it if/when a potential cross-vendor extension appears.

There are problems with the NV_mesh_shader extension which are not present in eg. D3D12:

The total number of output vertices is not known in runtime. D3D12 solves this with SetMeshOutputCounts which must appear before any outputs are written. NV_mesh_shader doesn't have this guarantee.

Any shader invocation can read the output of any other which is not possible in D3D12.

The NV indirect command buffer format is not supported by the hardware, so we have to emit several copy packets to make it work. Note that D3D12 uses 3D dispatches without an offset: (x, y, z) but NV_mesh_shader uses an 1D dispatch with offset: (taskCount, firstTask).

Therefore, NV_mesh_shader performs poorly compared to D3D12 mesh shaders.

Hisamera commented 2 years ago

Soooo... Do we know anything? Is there even a hypothetical timeline for this feature? I don't even see an assignee. Is there someone in the know, that can say in what phase this extension is? Developer feedback phase? Maybe hardware makers feedback phase? Experimental MVP phase? Is there anything that is there to know, or that we are allowed to know, so we can make predictions about the future?

simondeschenes commented 2 years ago

We have been waiting for a simple answer on whether the feature is in the work or not for more than a year. I suppose it is just better to assume nobody is working on it and it won't happen anytime soon. It would actually be a better answer to know that this feature is not planned at all than the lack of communication I am witnessing. I am disappointed by the lack of visibility.

Tobski commented 2 years ago

Apologies all for letting this issue languish so long, we should have provided an update once we took action here. I can confirm that we are working on this extension, but cannot confirm a release timeframe. I've marked on our internal release tracker to post to this issue once it goes live so you should all get pinged fairly quickly.

I will note that per @TomOlson's comment it's still the case that this feature fundamentally does not map well to all architectures, and we intend to provide further commentary about this once the extension ships.

codecnotsupported commented 2 years ago

Apologies all for letting this issue languish so long, we should have provided an update once we took action here. I can confirm that we are working on this extension, but cannot confirm a release timeframe. I've marked on our internal release tracker to post to this issue once it goes live so you should all get pinged fairly quickly.

I will note that per @TomOlson's comment it's still the case that this feature fundamentally does not map well to all architectures, and we intend to provide further commentary about this once the extension ships.

With not all architectures, are you (mostly) referring to low-end devices such as mobile or do you also mean the different vendors of the same target platform such as dedicated AMD/Intel cards?

Tobski commented 2 years ago

Not speaking on behalf of the Vulkan WG here, but mesh shaders still have many of the same implementation properties as geometry or tessellation shaders in that they allow unbounded pre-transform expansion. While they fix some of the implementation problems of those shader stages, many issues remain, including the bandwidth issues which do disproportionately affect mobile and embedded devices.

BeastLe9enD commented 2 years ago

Okay, I'm definitely excited to see what the extension will look like. Of course it is important that all devices (including mobile devices and embedded devices) deliver decent performance, but on the other hand I think DX12 compatibility will also play a role. So I would be interested to know if the extension will work fundamentally differently than DX12 or VK_NV_mesh_shader or if it will just contain additional options to optimize performance on all devices. That would be good to know so that as a developer you might already know what to expect. Many thanks in advance :)

codecnotsupported commented 2 years ago

I think Vulkan Profiles can play a role in providing a solution to DX12 compatibility vs cross device performance. But let's wait for the cross vendor variant to release & benchmarks before we get ahead of ourselves.

Venemo commented 2 years ago

@codecnotsupported

With not all architectures, are you (mostly) referring to low-end devices such as mobile or do you also mean the different vendors of the same target platform such as dedicated AMD/Intel cards?

We can't go into details of the new API yet, but I think it is not a secret that each vendor uses the same underlying hardware that is also being used for D3D12 mesh shaders. Therefore, many caveats from D3D12 apply here, too.

For example if you look at the D3D12 perf recommendations, AMD and NVidia recommend different workgroup sizes and slightly different meshlet sizes. To get optimal performance, application developers need extra work, eg. make the workgroup sizes of your mesh shaders configurable and use a different value for each vendor.

Andreyogld3d commented 2 years ago

Metal API has support for Mesh Shaders https://developer.apple.com/videos/play/wwdc2022/10162/ When Will be VK_KHR_mesh_shader ?

KlingelingelingDerEiermann commented 2 years ago

I have a lot of respect for all the developers working on this extension, and I'm aware that there are big challenges on some architectures, but I think it's time to be open about when to expect this extension and what it will look like. Last week Apple showed at the WWDC developer conference that MacOS and IOS will also use mesh shaders in the future. From what I've seen, these don't really differ that much from the DirectX12 mesh shaders, so I think it's a real shame that Vulkan in particular, as an open API, really doesn't reveal any information to the outside world. Especially since Vulkan is now actually the only API that doesn't support the feature, apart from the NVIDIA Extension.. Not that I want to put anyone under any time pressure, but there are some developers who are waiting for this extension and may want to know what to expect and, above all, when. I don't want to offend anyone, it was just a matter of personal concern for me to say so. I would be very happy if we could talk about this topic together. Thanks in advance!

ghost commented 2 years ago

And then, as always, some EXT extension will come out and leave it at that.

Venemo commented 2 years ago

Hi Everyone,

We agreed in Khronos that we can now share some information about the upcoming cross-vendor mesh shader extension publicly (though I can't give any promises about when it will be released). For this extension, compatibility with DirectX 12 was very important to us, therefore we follow the same main capabilities and restrictions. The shader programming model is also very similar. In a nutshell: if you can do something with DirectX mesh shaders, you will be able to also do it in Vulkan.

There are two new shader stages: mesh shaders and task shaders (optional, also known as amplification shader in DirectX), which can replace the current graphics pipeline. The new extension will support 3 dimensional dispatches (instead of 1D in NV_mesh_shader) and mesh shader outputs are more like DirectX 12 (the shader has to declare the number of output vertices/primitives first, similarly to SetMeshOutputCounts).

Both new shader stages follow a compute-like programming model.
Mesh shaders will support per-vertex and per-primitive output attributes. Their output is cooperatively produced by workgroups and directly consumed by the rasterizer.
Task shaders have the main purpose of dispatching mesh shader workgroups and an optional payload output which works like shared memory.

So I would be interested to know if the extension will work fundamentally differently than DX12 or VK_NV_mesh_shader

It won't be fundamentally different.

there are some developers who are waiting for this extension and may want to know what to expect

I hope this post helps and clears up what to expect.

BeastLe9enD commented 2 years ago

@Venemo Thank you very much!

Venemo commented 2 years ago

If someone wants to experiment with mesh shading technology today, the old NV_mesh_shader extension may be useful, with the following caveats:

Avoid using firstTask in your draw calls, set it to 0 (it won't be supported)
In task shaders, assign gl_TaskCountNV at the end of the shader and in uniform control flow (it will be easier to port to the new extension then, where it will work more like DX DispatchMesh)
In mesh shaders, assign gl_PrimitiveCountNV before writing anything to the output arrays, and do that in uniform control flow (see how SetMeshOutputCounts works), also don't use output loads (you also already can't in DX)

For performance considerations, check your vendor's recommendations on mesh shaders, the same principles will likely apply to Vulkan as well. Also keep in mind that mesh shading is a very low level tool and depending on where your application's bottlenecks are, it may be difficult to get more performance than what the traditional pipeline can already give you today.

BeastLe9enD commented 2 years ago

@Venemo Nice good to know! Will there be something like writePackedPrimitiveIndices4x8NV?

Venemo commented 2 years ago

Will there be something like writePackedPrimitiveIndices4x8NV?

As far as I know, writePackedPrimitiveIndices4x8NV only really makes sense on NVidia HW (and would only create overhead for other vendors). Instead, the indices output can be indexed by the primitive index like in DX.

BeastLe9enD commented 2 years ago

@Venemo At this point I would like to note that Apple also offers this feature in Metal:

void set_indices(uint I, uchar2 v);
void set_indices(uint I, uchar4 v);

Maybe this would be interesting for MoltenVK. It might therefore be useful to implement it, even if it might not be performant on all platforms.

Venemo commented 2 years ago

@BeastLe9enD I'm not familiar enough with Apple HW to judge, and I think this isn't really the right place to speculate about it. If you are looking for a more casual chat on the topic, consider the public Vulkan discord.

ghost commented 2 years ago

It's... finished? It's end? Why EXT?

Venemo commented 2 years ago

The new VK_EXT_mesh_shader has now been released today. See the official announcement here: https://www.khronos.org/blog/mesh-shading-for-vulkan

NVidia is shipping this in their beta driver and experimental support for the RADV and ANV drivers are also available.

Venemo commented 2 years ago

Why EXT?

Because it's a vendor neutral extension, hence it is not named after a specific company.

BeastLe9enD commented 2 years ago

@unit-a-user

It's... finished? It's end? Why EXT?

The Khronos Blog article says the following:

It is important to note, that while portability between APIs can be achieved, portability in performance among vendors is much harder. This is one of the reasons why this extension has not been released as a ratified KHR extension and Khronos continues to investigate improvements to geometry rasterization.

https://www.khronos.org/blog/mesh-shading-for-vulkan

rsahlin commented 1 year ago

AMD reports that the extension is included in the 22.11.1 driver release. https://www.amd.com/en/support/kb/release-notes/rn-rad-win-vulkan

However, my system does not report the extension - I'm using the 23.1.2 driver version My system is Win 11 ver 22H2 - my GPU is Radeon RX 6900 XT

Other, later extensions are visible - for instance the VK_EXT_depth_clamp_zero_one In my app I can see that the mesh shader properties and features are reported - but the actual extension is missing. Running vulkaninfoSDK.exe in version 1.3.239.0 does not report the extension.

Venemo commented 1 year ago

@rsahlin Khronos can't do anything about that, it sounds like a bug you should report to AMD.

Tobski commented 1 year ago

@rsahlin this is a known situation, the driver that shipped it was a beta (KB) driver, it hasn't made it to a full release yet, but will be available in a future driver. If you wish to use it, for now you should download that specific driver. We'll see if we can make this type of situation clearer in future.

rsahlin commented 1 year ago

Thanks for the information @Tobski I was looking at the Radeon driver support page : https://www.amd.com/en/support/kb/release-notes/rn-rad-win-vulkan Based on this info I thought it would be present beginning with the 22.11.1 version.

Where can I find more information and download the beta driver you mention?

Tobski commented 1 year ago

Where can I find more information and download the beta driver you mention?

@rsahlin the blue text directly under the header for that version is a link to a page with a download link.

rsahlin commented 1 year ago

Thanks so much @Tobski - downloaded and now it works! :-)

rsahlin commented 1 year ago

I have a question regarding how to use TaskPayloadWorkgroupEXT in a task shader using the VK_EXT_mesh_shader extension. Apologies if this is not a good place, don't know where else to post....

The documentation states that a new storage class is available as output from taskshader and input in mesh shader. However, I cannot figure out how to declare the payload. I cannot find any examples of how to use this new functionality, I think it would be beneficial to have.

What I want to do is simply have an id (uint) being output for each task workgroup.

Something like this in the task shader:

struct TaskPayloadWorkgroupEXT {
    uint ID;
} taskPayload;

main() {
    taskPayload.ID = gl_WorkGroupID.x * gl_WorkGroupSize.x;
....
    EmitMeshTasksEXT(x, y, z);
}

Any help or examples of how to use TaskPayloadWorkgroupEXT is greatly appreciated!

rg3igalia commented 1 year ago

Take a look at this basic CTS mesh shader test code:

https://github.com/KhronosGroup/VK-GL-CTS/blob/20d674342f008624b82e08f26ed5572c176ba7bd/external/vulkancts/modules/vulkan/mesh_shader/vktMeshShaderSmokeTestsEXT.cpp#L298

Note: the Vulkan Samples mesh shader sample does not use a payload, apparently.

Tobski commented 1 year ago

@rsahlin the best place to raise this question is in the https://github.com/KhronosGroup/Vulkan-Samples as a separate issue. Please raise separate questions as new issues.

Tobski commented 1 year ago

While I'm here, @deceased-a are you happy that this issue can be closed? It feels wrong to keep it open on the faint whiff of a different future extension (which we are tracking internally anyway). Mesh shading is implemented about as portably as it can be - any future thing as vaguely alluded to elsewhere would be a new paradigm.

gpx1000 commented 1 year ago

https://github.com/KhronosGroup/Vulkan-Samples/pull/624 <-- that sample might be more helpful. It's still currently in review.

Venemo commented 1 year ago

@rsahlin This video might be interesting to you: https://www.youtube.com/watch?v=OfqpkyoARFc also this sample app: https://github.com/nvpro-samples/gl_vk_meshlet_cadscene

rsahlin commented 1 year ago

Hi all and thanks for the replies!

All examples referred to use the taskPayloadSharedEXT - this is using shared memory and does not provide the same functionallity as TaskPayloadWorkgroupEXT.

I will open an issue in Vulkan-Samples. Thanks

rg3igalia commented 1 year ago

@rsahlin I think you may be confused due to different names in the SPIR-V spec and the GLSL spec. The GLSL taskPayloadSharedEXT qualifier maps exactly to the SPIR-V TaskPayloadWorkgroupEXT storage class. They are exactly the same.

See https://github.com/KhronosGroup/GLSL/blob/c316c438d456ab9418b93a3e6d065a14ad4a4455/extensions/ext/GLSL_EXT_mesh_shader.txt#L201

rsahlin commented 1 year ago

Thanks @rg3igalia - Yes, that totally confused me! :-)

KhronosGroup / Vulkan-Docs

Cross-vendor version of `VK_NV_mesh_shader` #1423

Notes about NV_mesh_shader