Algorithm correctness - Githubissues

Hi!

I was analyzing how the Multi Radix Sort works and I don't fully understand how you make sure it works correctly in all circumstances. The part that concerns me is the scattering/reordering part. Here, if I get it right you use bit masks to count the number of occurrences of a particular binID within the workgroup, next you account this difference on the global level with atomicAdd. So if workgroups were scheduled one after another, "sequentially", your approach would be 100% correct, however the workgroup order is generally undefined and workgroups can be executed in parallel, so I wonder how your algorithm deals with the case when two workgroups are scheduled in parallel or out of order. For the sake of simplicity let's see the parallel case:

                      WG1                         parallel to                               WG2                             
binOffset = global_offsets[binID] = x         |||||||||||||||||||         binOffset = global_offsets[binID] = x
prefix = p1                                   |||||||||||||||||||         prefix = p2
count = c1                                    |||||||||||||||||||         count = c2
output[binOffset + p1] = input                |||||||||||||||||||         output[binOffset + p2] = input
atomicAdd(global_offsets[binID], c1)          |||||||||||||||||||         atomicAdd(global_offsets[binID], c2)

So it looks to me that p1 and p2 are local to the respective WG, so in the case of parallel execution of WG1 and WG2, [output[binOffset + p{1,2}] = input] can step on each other toes and write to the same output indices, just because the global level atomicAdd(global_offsets[binID], c{1,2} hasn't been reached yet.

Or do I miss something obvious?

MircoWerner / VkRadixSort

Algorithm correctness #6