Split have_a_single_lane_compute into a non-returning and returning variants - Githubissues

eyalroz / cuda-kat

CUDA kernel author's tools

BSD 3-Clause "New" or "Revised" License

104 stars 8 forks source link

Split have_a_single_lane_compute into a non-returning and returning variants #75

Open eyalroz opened 4 years ago

eyalroz commented 4 years ago

The have_a_single_lane_compute primitive currently returns a value. But - this value is only valid for the single computing lane, and the caller doesn't even know which lane that is. That renders returning the value useless.

We should therefore split this collaboration primitive into two variants:

have_a_single_lane_execute() which returns void, and requires no warp-level synchronization; and
have_a_single_lane_compute() which does return a value, but uses get_from_lane() to propagate the value to all lanes.