Open natevm opened 11 months ago
Thank you for reporting this. I'll file a ticket internally and we'll get it fixed in the next release.
As I just realized this was reported on the old Parallel Sort sample, a fix to address this will be pushed with the next version of the FidelityFX SDK (which is how we are pushing out most updates to our older features now - https://github.com/GPUOpen-LibrariesAndSDKs/FidelityFX-SDK).
Also, in order to keep the GPU code as fast as possible, the fix will likely be done as a check on the NumKeys value at CPU time with an error code returned in the data setup stage.
Fwiw this sort implementation has been very helpful.
Small feature request, it would be great to also have a separate dedicated prefix sum/scan and a parallel device selection / compaction. Both of these are used internally by the radix sorter, but would also be helpful standalone.
I'll add it to the list of planned improvements to existing samples. Cheers!
Hello,
I've recently discovered that if the key/value buffers used in the sort are not a multiple of PARALLELSORT_THREADGROUP_SIZE * 4, then the buffers are read out of bounds and undefined behavior can occur.
See these lines below: https://github.com/GPUOpen-Effects/FidelityFX-ParallelSort/blob/0c539948c8d196ae338d91efbc8ca495f1ea0d1d/ffx-parallelsort/FFX_ParallelSort.h#L133-L137
No bounds checks are done to SrcBuffer here, causing GPU instability when these are read out of bounds.
Also an issue here:
https://github.com/GPUOpen-Effects/FidelityFX-ParallelSort/blob/0c539948c8d196ae338d91efbc8ca495f1ea0d1d/ffx-parallelsort/FFX_ParallelSort.h#L348-L360
Later on, the number of keys is checked, but by that point it's too late: https://github.com/GPUOpen-Effects/FidelityFX-ParallelSort/blob/0c539948c8d196ae338d91efbc8ca495f1ea0d1d/ffx-parallelsort/FFX_ParallelSort.h#L369-L371
I suspect the fix would be to just check the number of keys before pre-loading the key/value pairs.
Reproducing is simple enough, just run the sort on data that is less than PARALLELSORT_THREADGROUP_SIZE * 4 with GPU-assisted validation that checks out of bounds descriptor reads.