eyalroz / cuda-kat

CUDA kernel author's tools
BSD 3-Clause "New" or "Revised" License
105 stars 8 forks source link

Untangle the mess in primitives/ #4

Closed eyalroz closed 4 years ago

eyalroz commented 5 years ago

The code under src/cuda/on_device/primitives is a hot mess.

I mean, most of it is very useful, but not all of it; and there's almost no order to the different files except w.r.t. to the scope of collaboration (warp/block/grid).

At the very least we need to:

  1. Remove code whose general usefulness is limited/questionable.
  2. Extract related functionality into a separate file (or files for differnet scope):

    1. Shared memory
    2. Thread/lane coordination
    3. Iteration/coverage patterns (like at_warp_stride())
    4. Reductions and reduction-like operations
    5. (Search?)

    ... and do it while keeping the namespace scheme (e.g. separating block-scope from grid-scope functions).

  3. Consider duplicate functionality (there's probably a bit of that in there)
eyalroz commented 5 years ago

So, shared memory code is separate. So are operations on sequences of data elements in memory (e.g. reductions).

There's still some splitting-off to do. Traversal/iteration is a prime candidate.