ROCm / hcc

HCC is an Open Source, Optimizing C++ Compiler for Heterogeneous Compute currently for the ROCm GPU Computing Platform
https://github.com/RadeonOpenCompute/hcc/wiki
Other
433 stars 108 forks source link

Where is the definition of `__ockl_sdot2` #1188

Closed ghostplant closed 5 years ago

ghostplant commented 5 years ago

__ockl_sdot2 is used by amd_mixed_dot, and I want to see how the saturate influences the result. By tracing all the source codes from hcc and HIP, I didn't see such implementation of __ockl_sdot2 anywhere, so I won't know how saturate works. Can anyone provide more information about saturate argument from __ockl_sdot2?

b-sumner commented 5 years ago

It works just like the clamp bit works on other instructions. Anyway, see https://github.com/RadeonOpenCompute/ROCm-Device-Libs/blob/master/ockl/src/dots.cl#L56

b-sumner commented 5 years ago

Ugh. I see a bug in the software implementation though. Should be fixed soon.

ghostplant commented 5 years ago

@b-sumner Thanks, if I use gfx906 which supports hardward dot4 for int8, will saturate=true be slower than saturate=false since it requires extra boundary checking?

ghostplant commented 5 years ago

@b-sumner Besides, as I am recently using this device call, can you tell me what bug it is?

b-sumner commented 5 years ago

The software implementation was using an int where a long was required on line 62.

ghostplant commented 5 years ago

@b-sumner Thanks! It doesn't matter as I am only using gfx906 instead of older arches.