Open MineCake147E opened 2 years ago
Tagging subscribers to this area: @dotnet/area-system-numerics See info in area-owners.md if you want to be subscribed.
Author: | MineCake147E |
---|---|
Assignees: | - |
Labels: | `api-suggestion`, `area-System.Numerics`, `untriaged` |
Milestone: | - |
Tagging subscribers to this area: @dotnet/area-system-runtime-intrinsics See info in area-owners.md if you want to be subscribed.
Author: | MineCake147E |
---|---|
Assignees: | - |
Labels: | `api-suggestion`, `area-System.Numerics`, `area-System.Runtime.Intrinsics`, `untriaged` |
Milestone: | - |
I have edited my comments, including adding Intrinsics for ARM.
Worth noting that now with AVX512-FP16 all operations are now accelerated on XArch.
I have an upcoming need for VectorXXX<Half>
, specifically for conversions between Float32 and Float16.
I'm adding color management to my image editing app (Paint.NET). Storing the canvas tiles at a higher precision (RGBA Float16 instead of BGRA byte
aka BGRA32) will enable me to maintain higher color accuracy throughout my rendering pipeline. These tiles are often mipmaps for when the user has zoomed out (e.g. 50% or lower), which involves resizing. I do all of the mipmap generation on the CPU and then copy the tile bitmaps to the GPU (for Direct2D). (If I did mipmap generation on the GPU, it would consume an extraordinary amount of GPU memory and flood the PCI-E bus. It doesn't really work.)
The resizing and color transform steps are done on the CPU at Float32 precision with very high performance thanks to @saucecontrol's PhotoSauce.MagicScaler library. I then convert from RGBA Float32 back to BGRA32 on the CPU, which later on is then presented to the screen via Direct2D, converting to sRGB/scRGB with the Color Management effect, which operates at high precision (up to Float32) and renders into a Float16 swapchain.
So the conversion goes from BGRA32 (CPU bitmap) --> RGBA Float32 (CPU intermediate buffers) -> BGRA32 (D2D bitmap) -> RGBA Float32 (effect intermediate texture(s)) -> RGBA Float16 (render target / swapchain).
I'd like to be able to do BGRA32 (CPU bitmap) --> RGBA Float32 (CPU intermediate buffers) -> RGBA Float16 (D2D bitmap) -> RGBA Float32 (effect intermediate texture(s)) -> RGBA Float16 (render target / swapchain).
I can use RGBA Float32 textures on the GPU today, but Float16 uses 1/2 the memory and PCI-E bandwidth, and won't lose any useful precision at this point in the rendering pipeline.
For now I can P/Invoke a method in a native C/C++ DLL, but having native support in C# would be great.
Added proposal of AVX-512 FP16
APIs.
Background and motivation
It's been a long time .NET first added
Half
type.But there's no support for hardware acceleration of conversion, that might be a common use case.
In Ivy Bridge and newer x86 processors, F16C is provided as a way to convert between
float
andHalf
.ARMv8-A also has a way to convert between them like F16C.
In Sapphire Rapids and newer x86 processors, AVX-512 FP16 is provided as a way to perform arithmetic operations of
Half
values.So I think it's great if .NET had support for hardware acceleration of conversion between them.
API Proposal
EDIT:
Vector64<Half>
design is removed as it's avoided in x86.EDIT: Added some arithmetic APIs including
MultiplyAddEstimate
discussed in #98053.Addition to
Vector*
Vector(64|128|256|512)?<T>
shouldn't throw any exceptions ifT
wasHalf
.F16C
AVX-512 FP16
Addition to AdvSimd.Arm64
Scalar variants are not included in favor of
Half
's explicit operator optimizations.API Sample Usage
F16C
AdvSimd.Arm64
Alternative Designs
No response
Risks
No response