KhronosGroup / MoltenVK

MoltenVK is a Vulkan Portability implementation. It layers a subset of the high-performance, industry-standard Vulkan graphics and compute API over Apple's Metal graphics framework, enabling Vulkan applications to run on macOS, iOS and tvOS.
Apache License 2.0
4.85k stars 429 forks source link

Support for new Metal 3.0 features.. #631

Open oscarbg opened 5 years ago

oscarbg commented 5 years ago

Hi, we have Metal news from WWDC19.. there is new Metal 3.0 API: (link: https://developer.apple.com/metal/Metal-Feature-Set-Tables.pdf) and new Metal 2.2 Shading language: (link: https://developer.apple.com/metal/Metal-Shading-Language-Specification.pdf)

I have opened a separate SPIRV-Cross issue to track/handle new Metal SL features: https://github.com/KhronosGroup/SPIRV-Cross/issues/1004

From quick inspection new features interesting for MoltenVK devs are: iOS only: supp. for Indirect Command Buffers (ICB) for compute commands From Metal 3.0 features table: Non-Square Tile Dispatch Texture swizzle placement heap line width primitiveID&barycentric From State of the union and Keynote: Metal async shader compilation Metal P2P API: (fast GPU Mem Copy?) for new Mac Pro (2019) Radeon Pro Vega II dual GPU connected via Infinity fabric (has BW of up to 84GB/s) *Counter Sample Buffers

also intersting is tweets from Apple engineer Gokhan Avkarogullari (@gavkar) https://twitter.com/gavkar/status/1135772421791764485

- A new set of APIs for simplified GPU family handling
- Indirect Command Buffers (ICB) for encoding compute workloads on the GPU
- iOS gets PSO indirection for ICB
- iOS gets range indirection for ICB
- Heaps support developer driven placement
- Heaps can track resources
- MacOS blit alignment rules are relaxed to match Apple GPUs
- Improvement on resource usage 
- Well defined behavior for texture OOB access
- Texture custom swizzle
- Texture sharing across processes 
- IOS texture binding number increased to 94 (from 32)
- iOS varying limit increased to 124
- ASTC 3D support for recent Apple GPUs
- 16 bit depth
- 3D BC textures on all Mac GPUs
- visibility buffer (aka occlusion query) size increased to 256K
- sRGB view on non sRGB textures
- Metal memory debugger 
-iOS Simulator now supports Metal

As said on Spirv-Cross:

no pressure&hurry to support these.. this will be in beta as you know until late September.. just wanted to post now that I have taken some look at it before it goes out of my mind.. :-))

Degerz commented 5 years ago

Texture custom swizzle

Finally, no more ugly workarounds in SPIRV-Cross to support VkComponentMapping for Metal anymore ...

IOS texture binding number increased to 94 (from 32)

Improvement but still behind D3D11 limit of 128 texture binds. I still wish for next year that Apple's GPUs could support NV-style bindless image handles ... (bindless texture handles instead of unbounded resource indexing would go great with offline renderers such as Blender's Cycles renderer since it's used in their CUDA backend)

Not a big deal if logical operations didn't make it into the API since raster order groups exists to do arbitrary blending. It'd be also great if Apple GPUs could support both inner and outer conservative rasterization next year as well ...

billhollings commented 5 years ago

@oscarbg Thanks for the review summary!

Definitely some interesting enhancements there. Texture swizzle and line widths in particular should be fairly helpful, and straightforward to map in.

oscarbg commented 5 years ago

Hi, Yep.. interesting ones.. upon Metal headers inspection is now clear that at least three more optional Vulkan features could be exposed by MoltenVK, namely (some maybe Mac only not clear yet): shaderInt64 wideLines *pipelineStatisticsQuery

still hope next year Metal exposes at least (for desktop GPUs): depthBounds shaderFloat64 as sparse/tiled resources maybe out of interest to Apple..

for pipelineStatisticsQuery now is more clear the "Counter Sample Buffers" functionality.. now there is MTLCounters.h with exposes most needs of it (https://vulkan.lunarg.com/doc/view/1.0.33.0/linux/vkspec.chunked/ch16s04.html)

@constant MTLCommonCounterTimestamp The GPU time when the sample is taken.
 @constant MTLCommonCounterTessellationInputPatches The number of patches input to the tessellator.
 @constant MTLCommonCounterVertexInvocations The number of times the vertex shader was invoked.
 @constant MTLCommonCounterPostTessellationVertexInvocations The number of times the post tessellation vertex shader was invoked.
 @constant MTLCommonCounterClipperInvocations The number of primitives passed to the clipper.
 @constant MTLCommonCounterClipperPrimitivesOut The number of primitives output from the clipper.
 @constant MTLCommonCounterFragmentInvocations The number of times the fragment shader was invoked.
 @constant MTLCommonCounterFragmentsPassed The number of fragments that passed Depth, Stencil, and Scissor tests.
 @constant MTLCommonCounterComputeKernelInvocations The number of times the computer kernel was invoked.

It may even allow exposing upcoming VK_EXT_performance_query extension which now we have VK_INTEL_performance_query as it exposes cycles counters data:

@constant MTLCommonCounterTotalCycles The total number of cycles.
 @constant MTLCommonCounterVertexCycles The amount of cycles the vertex shader was running.
 @constant MTLCommonCounterTessellationCycles The amount of cycles spent in the tessellator.
 @constant MTLCommonCounterPostTessellationVertexCycles The amount of cycles the post tessellation vertex shader was running.
 @constant MTLCommonCounterFragmentCycles The amount of cycles the fragment shader was running.
 @constant MTLCommonCounterRenderTargetWriteCycles The amount of cycles spent writing to the render targets.
cdavis5e commented 5 years ago

I think the upcoming peer group functionality, which hasn't landed yet, should enable us to do VK_KHR_device_group.

cdavis5e commented 5 years ago

*wideLines

Are you sure? I know about the "secret" methods that are present for this purpose, and I know that, for whatever reason, they put a line item in the feature table for this, but I haven't seen them exposed in the public SDK headers yet. Maybe it'll land later, like peer groups.

oscarbg commented 5 years ago

@cdavis5e yep seems P2P should allow VK_KHR_device_group.. I was not even asking about it as it would need a pricey Mac Pro 2019 setup to take advantage of it.. about wideLines, you are right I haven't seen any mention on headers.. but you got me interested on "secret" methods are present already.. can you answer if you are aware of more Metal "secret functionality" in addition to wideLines and logicOps support? In case wideLines doesn't end being exposed for final launch, would be nice if you can post similar patch as logicOps showing this "hidden" support..

cdavis5e commented 5 years ago

Another feature we could do: shaderResourceMinLod. The new min_lod sample parameter should be useful for that.

Apple even explicitly calls this out in the docs as useful for texture streaming... I wonder if they really are going to support sparse/tiled resources. In OpenGL, the textureMinLod GLSL functions came along with support for sparse textures. Support for D3D12 reserved resources just landed in VKD3D, so sparse/tiled resources would be really useful.

oscarbg commented 5 years ago

yep.. curiously MoltenVK could be like Intel Mesa driver that advertises it, without sparse texture support: http://www.vulkan.gpuinfo.org/displayreport.php?id=5749#features nice find VKD3D added today sparse support.. hope not too far in VKD3D TODO list is ROV support now that we have the VK_EXT_fragment_shader_interlock.. in that case MoltenVK also can support it and there is an issue already opened..

oscarbg commented 5 years ago

@cdavis5e yes seems sparse support is mostly done.. at least metal shader compiler.. metal_texture: HAVE_SPARSE_TEXTURES HAVE_SPARSE_SAMPLE_COMPARE_BIAS_OVERLOAD HAVE_SPARSE_SAMPLE_COMPARE_GRADIENT2D_OVERLOAD define lots of overloaded texture functions so calling in theory metal -DHAVE_SPARSE_TEXTURES should work.. also found rasterisation rate (HAVE_RASTERIZATION_RATE) in metal_graphics maybe A13 supports VRS..

Degerz commented 5 years ago

also found rasterisation rate (HAVE_RASTERIZATION_RATE) in metal_graphics maybe A13 supports VRS..

It's more likely that Metal 3.0 is exposing a feature that was previously exposed in D3D11.1 which was "target independent rasterization". Prior to this, Metal did not expose the functionality to specify coverage samples individually from the colour samples within the render target.

Within the MTLRenderPipelineDescriptor object, I now see a separate rasterSampleCount variable along with with the regular sampleCount variable which seems to imply that one can now specify their coverage samples to be greater than their colour samples.

For reference, here is the OpenGL equivalent. For comparison, on Nvidia's Maxwell 2nd gen architecture and up they have the capability to do what they refer as "target independent multisampling" which in addition to being able to specify the amount of raster samples individually from the colour samples they also have the ability to independently specify their depth/stencil samples to be greater than the colour samples as well.

cdavis5e commented 5 years ago

The type guarded by that macro (metal_graphics:182) has methods to convert between "physical" and "screen" coordinates. It's more than likely used to support variable-rate shading.

billhollings commented 5 years ago

To give us a place to move forward on this, I've added a Metal-3.0 branch, and made an initial commit to it in PR #638, to cover some basic version admin.

FunMiles commented 4 years ago

On a positive note, one feature works but the extension for it is not reported. I have tested the use of GL_NV_fragment_shader_barycentric and made use of the matching gl_BaryCoordNV in a fragment shader and it does work on a 2019 MacBook Pro. The run proceeds correctly, though the validation layer gives an error message:

Shader requires VkPhysicalDeviceFragmentShaderBarycentricFeaturesNV::fragmentShaderBarycentric but is not enabled on the device

Trying to request the extension gives an error and the Instance cannot be created.

FunMiles commented 4 years ago

One more note related to this issue. The primitiveID does work on my 2019 MacBook pro. Yet, again, I get a validation layer message that using it requires geometry. Yet the SPIR-V specs says it is enabled by tesselation, so it should work anyways in MoltenVK. I'm not a spec lawyer and could be misinterpreting the specs saying that PrimitiveID's enabling features are Geometry, Tessellation. @billhollings, do you have an opinion? Can I rely on PrimitiveID always working?