The have_a_single_lane_compute primitive currently returns a value. But - this value is only valid for the single computing lane, and the caller doesn't even know which lane that is. That renders returning the value useless.
We should therefore split this collaboration primitive into two variants:
have_a_single_lane_execute() which returns void, and requires no warp-level synchronization; and
have_a_single_lane_compute() which does return a value, but uses get_from_lane() to propagate the value to all lanes.
The
have_a_single_lane_compute
primitive currently returns a value. But - this value is only valid for the single computing lane, and the caller doesn't even know which lane that is. That renders returning the value useless.We should therefore split this collaboration primitive into two variants:
have_a_single_lane_execute()
which returnsvoid
, and requires no warp-level synchronization; andhave_a_single_lane_compute()
which does return a value, but usesget_from_lane()
to propagate the value to all lanes.