eyalroz / cuda-kat

CUDA kernel author's tools
BSD 3-Clause "New" or "Revised" License
104 stars 8 forks source link

Split have_a_single_lane_compute into a non-returning and returning variants #75

Open eyalroz opened 4 years ago

eyalroz commented 4 years ago

The have_a_single_lane_compute primitive currently returns a value. But - this value is only valid for the single computing lane, and the caller doesn't even know which lane that is. That renders returning the value useless.

We should therefore split this collaboration primitive into two variants:

  1. have_a_single_lane_execute() which returns void, and requires no warp-level synchronization; and
  2. have_a_single_lane_compute() which does return a value, but uses get_from_lane() to propagate the value to all lanes.