eyalroz / cuda-kat

CUDA kernel author's tools
BSD 3-Clause "New" or "Revised" License
105 stars 8 forks source link

Distinguish between PTX builtins and SASS builtins #34

Open eyalroz opened 4 years ago

eyalroz commented 4 years ago

At the moment, our effective definition of a "builtin" function is one that produces a single PTX instruction (when inlined); and this definition is not even entirely consistent in our code.

However, PTX instructions are in no way guaranteed to become a single SASS instruction. An example which motivated our inconsistency: CLZ vs CTZ. There's is a CLZ instruction in PTX. But... no NVIDIA micro-architecture has that as a single instruction. It's just implemented using SASS internally somewhere.

It should be clear to the user of cuda-kat what will result in a single hardware instruction and what may or may not be one.