Closed ghostplant closed 5 years ago
It works just like the clamp bit works on other instructions. Anyway, see https://github.com/RadeonOpenCompute/ROCm-Device-Libs/blob/master/ockl/src/dots.cl#L56
Ugh. I see a bug in the software implementation though. Should be fixed soon.
@b-sumner Thanks, if I use gfx906 which supports hardward dot4 for int8, will saturate=true
be slower than saturate=false
since it requires extra boundary checking?
@b-sumner Besides, as I am recently using this device call, can you tell me what bug it is?
The software implementation was using an int where a long was required on line 62.
@b-sumner Thanks! It doesn't matter as I am only using gfx906 instead of older arches.
__ockl_sdot2
is used byamd_mixed_dot
, and I want to see how thesaturate
influences the result. By tracing all the source codes from hcc and HIP, I didn't see such implementation of__ockl_sdot2
anywhere, so I won't know howsaturate
works. Can anyone provide more information aboutsaturate
argument from__ockl_sdot2
?