In #225 we use BLOCK_SCAN_WARP_SCANS to make sure prefix sum result is monotonic, however, we found there are still cases that InclusiveSum with BLOCK_SCAN_WARP_SCANS algorithm still do not return monotonic output.
In this PR, we fix the behavior in another way: apply prefix sum on a pair (value, greater_than_0) instead of only value, and write i to output only when value[i] > u && value[i - 1] <= u and prob[i] > 0. If there are multiple i that satisfy this condition (because of the floating point numerical issues), we select the smallest i.
In #225 we use
BLOCK_SCAN_WARP_SCANS
to make sure prefix sum result is monotonic, however, we found there are still cases that InclusiveSum withBLOCK_SCAN_WARP_SCANS
algorithm still do not return monotonic output.In this PR, we fix the behavior in another way: apply prefix sum on a pair
(value, greater_than_0)
instead of onlyvalue
, and writei
to output only whenvalue[i] > u && value[i - 1] <= u
andprob[i] > 0
. If there are multiplei
that satisfy this condition (because of the floating point numerical issues), we select the smallesti
.