KhronosGroup / Vulkan-Ecosystem

Public repository for Vulkan Ecosystem issues
Apache License 2.0
132 stars 15 forks source link

8/16 component vector type in GLSL #50

Closed nihui closed 3 years ago

nihui commented 4 years ago

Hello all

spirv spec allows vector size 8 and 16, and mesa nir supports vec8 and vec16 However, there is no way to declare such vector type in GLSL

The rationale is that arm mali midgard gpu can actually do 8 fp16 packed math with one instruction, and I need a proper way to take the advantage at the high level GLSL language. Though it should be possible to generate vec8/vec16 spirv binary directly, it is difficult to maintain and not human-readable oriented.

As for now, I declare a custom structure sfpvec8 like

#extension GL_EXT_shader_16bit_storage: require
struct sfpvec8 { f16vec4 abcd; f16vec4 efgh; };

and then use it like

layout (binding = 0) buffer bottom_top_blob { sfpvec8 bottom_top_blob_data[]; };

int gi = 0;
sfpvec8 v = bottom_top_blob_data[gi];

v.abcd = v.abcd + f16vec4(123.f);
v.efgh = v.efgh + f16vec4(123.f);

In this sample, I have to write two lines of code for the addition, and I quite doubt if these two lines will eventually compiled into one instruction on supported platforms. Besides, an 8-component vector of float16_t is 128bit, I need a way to tell the gpu to read/write 128bit at once instead of two 64bit. This is another concern. For the optimized compiler, vectorization is hard while de-vectorization is simple. native vec8 helps the programmer and the driver.

Lots of compute shaders in ncnn project are written in this way at the moment https://github.com/Tencent/ncnn/blob/master/src/layer/vulkan/shader/absval_pack8.comp and all shaders with _pack8 suffix here https://github.com/Tencent/ncnn/tree/master/src/layer/vulkan/shader

I suppose it is not hard to add these new types to glslang since there is already 8/16 vector type in spirv. The underlying gpu vulkan drivers shall already be able to handle them properly.

Thanks

johnkslang commented 4 years ago

Sometimes tool chains scalarize a whole shader and give one component-slot in each vector to one invocation of the shader, such that wider vectors mean more invocations running at the same time.

This is far simpler than trying to vectorize scalar (or subvector) operations within a single shader. This is different than normal CPU workload/vectorization due to the assumption of many instances of invocations of the same thing that need to run.

This of course depends on the type of hardware, mix of registers, and number of invocations that need to run.

There will be a lot of language and tool chain assumptions that GLSL vectors are 4 components. But, if a new type is worthwhile, that can all be fixed, especially if the bigger vectors don't have .xyz-like swizzles; just array indexing. However, the question is whether that is worthwhile, or if instead the drivers have already figured out how to utilize hardware the best way by scalarizing.

What would be interesting to see is an experiment you could do: If you declared arrays of 8 elements, and used them like vectors, whether you get a 4x performance loss over your vec4 code. That might indicate what would happen going from 4 to 8-component vectors.

marty-johnson59 commented 3 years ago

This repository is being archived as it has been replaced with the vulkan.org website and is no longer being maintained (i.e., issues posted here are no longer being addressed ). After reviewing issues posted here, most (if not all) have been resolved or have already been re-opened in Vulkan-Docs (https://github.com/KhronosGroup/Vulkan-Docs) or other repositories for further consideration. Therefore, all issues in this repository will be closed. If you believe your issue has not yet been resolved, please re-open in Vulkan-Docs. Thanks!