Publish GLSL_EXT_integer_dot_product

KhronosGroup / GLSL

GLSL Shading Language Issue Tracker

324 stars 96 forks source link

Publish GLSL_EXT_integer_dot_product #195

Open kpet opened 1 year ago

kpet commented 1 year ago

Signed-off-by: Kevin Petit kevin.petit@arm.com Change-Id: Ifc55b44217819cf5c6998eeb4aedcc8d7597e65d

nihui commented 1 year ago

I wonder if there is currently good and handy type conversion function for packed4x8 type in integer_dot_product, like packHalf2x16() unpackHalf2x16() for packed fp16. Shouldn't there be packInt4x8EXT() packUInt4x8EXT() unpackInt4x8EXT() unpackUInt4x8EXT() for packed 8bit integer ?

Existing conversions such as packSnorm4x8() will lose precision due to norm and are not suitable for use in computational tasks.

A common scenario: Some gpu drivers do not support 8bit/16bit storage feature, so cannot read and write with 8bit integer But we can pack 4x8bit into a 32bit and unpack it in shader for calculation, which can significantly reduce the bandwidth overhead of gpu memory

I admit that we could simulate this pack/unpack process with bit shift/or/and operations, but doing so is complex and limited and can be inefficient

gnl21 commented 1 year ago

I wonder if there is currently good and handy type conversion function for packed4x8 type in integer_dot_product

Currently no, I don't think there is. One could be added but ...

I admit that we could simulate this pack/unpack process with bit shift/or/and operations, but doing so is complex and limited and can be inefficient

I'm not sure why you say that they are limited, so perhaps you have something different in mind, but compilers should already understand that these are pack/unpack operations and be emitting good instruction sequences for them. I don't think we should expect any gain in efficiency from adding functions for this.

gnl21 commented 1 year ago

Thanks for the fixes. What's the state of implementations on this? Is it ready to be merged now?

kpet commented 1 year ago

Thanks for the fixes. What's the state of implementations on this? Is it ready to be merged now?

Thanks for the review! As far as I'm aware there are two prototypes from two different people, neither of which is complete. Neither author is able to continue work on their implementation. I expect this to be discussed in today's Vulkan ML teleconference and hopefully someone will be able to pick up the work. We could wait until we have a complete implementation to merge just in case implementation work turns up spec issues but I think this spec has been reviewed a good number of times by now so maybe publishing is helpful. I don't feel too strongly either way.