Closed nihui closed 3 years ago
Sometimes tool chains scalarize a whole shader and give one component-slot in each vector to one invocation of the shader, such that wider vectors mean more invocations running at the same time.
This is far simpler than trying to vectorize scalar (or subvector) operations within a single shader. This is different than normal CPU workload/vectorization due to the assumption of many instances of invocations of the same thing that need to run.
This of course depends on the type of hardware, mix of registers, and number of invocations that need to run.
There will be a lot of language and tool chain assumptions that GLSL vectors are 4 components. But, if a new type is worthwhile, that can all be fixed, especially if the bigger vectors don't have .xyz-like swizzles; just array indexing. However, the question is whether that is worthwhile, or if instead the drivers have already figured out how to utilize hardware the best way by scalarizing.
What would be interesting to see is an experiment you could do: If you declared arrays of 8 elements, and used them like vectors, whether you get a 4x performance loss over your vec4 code. That might indicate what would happen going from 4 to 8-component vectors.
This repository is being archived as it has been replaced with the vulkan.org website and is no longer being maintained (i.e., issues posted here are no longer being addressed ). After reviewing issues posted here, most (if not all) have been resolved or have already been re-opened in Vulkan-Docs (https://github.com/KhronosGroup/Vulkan-Docs) or other repositories for further consideration. Therefore, all issues in this repository will be closed. If you believe your issue has not yet been resolved, please re-open in Vulkan-Docs. Thanks!
Hello all
spirv spec allows vector size 8 and 16, and mesa nir supports vec8 and vec16 However, there is no way to declare such vector type in GLSL
The rationale is that arm mali midgard gpu can actually do 8 fp16 packed math with one instruction, and I need a proper way to take the advantage at the high level GLSL language. Though it should be possible to generate vec8/vec16 spirv binary directly, it is difficult to maintain and not human-readable oriented.
As for now, I declare a custom structure sfpvec8 like
and then use it like
In this sample, I have to write two lines of code for the addition, and I quite doubt if these two lines will eventually compiled into one instruction on supported platforms. Besides, an 8-component vector of float16_t is 128bit, I need a way to tell the gpu to read/write 128bit at once instead of two 64bit. This is another concern. For the optimized compiler, vectorization is hard while de-vectorization is simple. native vec8 helps the programmer and the driver.
Lots of compute shaders in ncnn project are written in this way at the moment https://github.com/Tencent/ncnn/blob/master/src/layer/vulkan/shader/absval_pack8.comp and all shaders with _pack8 suffix here https://github.com/Tencent/ncnn/tree/master/src/layer/vulkan/shader
I suppose it is not hard to add these new types to glslang since there is already 8/16 vector type in spirv. The underlying gpu vulkan drivers shall already be able to handle them properly.
Thanks