[QST] Integer 16 support

NVIDIA / cutlass

CUDA Templates for Linear Algebra Subroutines

Other

5.7k stars 978 forks source link

[QST] Integer 16 support #1841

Open sycz00 opened 1 month ago

sycz00 commented 1 month ago

What is your question? Hey folks,

first of all, thanks for this framework. I was recently implementing a cuda kernel for a Linear Layer and a Convolutional 2D layer using Int8 arithemtric and saving it into int32 tensors. I was wondering, when implementing a similar kernel for int16 input and weight and saving it into int32/64 does not work. I figured out, that there is no available kernel configuration for such operation. Do you think this might be added somehow or do you know a fancy trick or workaround to accomplish this task ?

Thanks for your help !

thakkarV commented 1 month ago

The reason why these do not exist is because there is no native tensor core instruction for int16 inputs. You can take multiple paths here:

Quantize your inputs to int8 or fp16 and then dispatch to tensor core convolutions for this dtypes
Fuse a quantization into the int8/f16 kernel
Use SIMT convolutions to dispatch to the DP2A instruction set (but this still forces you to have or quantize at least one of the operands to int8)

github-actions[bot] commented 4 weeks ago

This issue has been labeled inactive-30d due to no recent activity in the past 30 days. Please close this issue if no further response or action is needed. Otherwise, please respond with a comment indicating any updates or changes to the original issue and/or confirm this issue still needs to be addressed. This issue will be labeled inactive-90d if there is no activity in the next 60 days.