TheRealMJP / TheRealMJP.github.io

Backing repo for my blog
16 stars 1 forks source link

Half The Precision, Twice The Fun: Working With FP16 In HLSL #3

Open utterances-bot opened 4 years ago

utterances-bot commented 4 years ago

Half The Precision, Twice The Fun: Working With FP16 In HLSL

undefined

https://therealmjp.github.io/posts/shader-fp16/

allanmac commented 4 years ago

Re: footnote [1]: https://github.com/GPUOpen-Drivers/AMDVLK/issues/94

TheRealMJP commented 4 years ago

Thank you Allan! That's roughly what I suspected was going on, but it's good to see it confirmed.

rejunity commented 4 years ago

What about use of fp16 in shared memory (LDS/TGSM) with DX? Any ideas?

TheRealMJP commented 4 years ago

That's a good point, I hadn't thought about shared memory! I just did a quick experiment in shader playground and it all seems to work fine when compiled to both DXIL and SPIR-V. I think I would want to look at the final compiled ISA though to see what instructions are being generated.

rejunity commented 4 years ago

Indeed seems to work nicely. Thank you!

alecazam commented 3 years ago

Do you happen to know the equivalent for glslc? All I get when I use "#define half float16_t", f16vec2, ... are conflicts on all function calls that they are still full float. And I'm using 2020.3. I have the "#extension GL_AMD_gpu_shader_half_float: enable" in the shaders.

allanmac commented 3 years ago

In Vulkan 1.2 it's straightforward because the extension VK_KHR_shader_float16_int8 is now promoted to core.

If your device supports VkPhysicalDeviceVulkan12Features.shaderFloat16 then you would create a VkDevice with this feature enabled.

In the shader you would:

If you're on a Vulkan 1.1 device then it's a little more of a hassle to enable the extension and features.

Also, you probably want to stop using the GL_AMD_gpu_shader_half_float extension since (I think) it is completely superseded by VK_KHR_shader_float16_int8.

Finally, turn on Vulkan Validation. It's very good at flagging anything that you're doing wrong.

alecazam commented 3 years ago

Not working for me. I've tried all the extensions. I don't know how I make this much simpler for glslc to honor. I compile with --target-env=vulkan1.2

if VULKAN >= 120

extension VK_KHR_shader_float16_int8 : enable

else

extension GL_AMD_gpu_shader_half_float: enable

extension GL_AMD_gpu_shader_half_float_fetch: enable

endif

// Types defined to closer match Metal and force 16bit ALU

define half float16_t

half saturate(half x) { return clamp(x, half(0.0), half(1.0)); }

error: 'clamp' : no matching overloaded function found error: 'return' : type does not match, or is not convertible to, the function's return type

alecazam commented 3 years ago

I should add, all of these have "#version 310 es", and I don't see the 120 path taken.

alecazam commented 2 years ago

I finally have float16_t codegen in spirv. This required setting --target-env=vulkan1.1, and using #extension GL_EXT_shader_explicit_arithmetic_types : require as you'd mentioned. This runs fine on Mali, and I can see the half ops in the spiv-assembly. We can also now transpile to Metal and get half usage there.

But on Adreno parts the same shaders generates the following. There are no validation warnings, the compile just fails, and our Vulkan startup fails. If we pass the same shaders without the half usage compiled for target vulkan1.0, then Adreno is fine. But this Adreno part publishes the float16_int8 extension with "true" for the shaderFloat16 support. These extensions are all discovered and enabled for a 1.1 instance, so the init is all correct.

I AdrenoVK-0: Shader compilation failed for shaderType: 1 I AdrenoVK-0: Info log: Assertion failed: false && "Unknown floating point rounding mode"

I think this may be lack of RTZ (round-to-zero) support in the float_controls, but there's not much we can do at the glslc/spirv-opt level to avoid this I think. Mali doesn't even expose that extension.

alecazam commented 2 years ago

Mali/PowerVR/iOS - have all true 16bit_storage. Easy to use float as in/out, uniforms, and ALU ops with minimal conversions. Easy. ARM - missing pushConstants16. Easy.

Adreno - is missing RTZ from float_controls so fp16 shaders crash, only has 1 of 4 16bit_storage settings (shaderStorage16). Hard. Nvidia - missing inputOutput16. Hard. Cast in/out of all shaders fp16 from/to fp32.

alecazam commented 1 year ago

Npte that glslc doesn't seem to honor half for codegen like DXC does with -enable-16bit-types.

https://github.com/google/shaderc/issues/1309