The fourth argument for shfl.up.sync is membermask(doc), which is typical 0xFFFFFFFF instead of the value to be added. This bug will trigger illegal instructions after Hopper architecture and it is an incorrect usage though there are no alerts on the Volta/Ampere architecture.
The code from moderngpu was introduced through https://github.com/baidu-research/warp-ctc/pull/148, and there are updates after this pull request (commits), I recommend you to schedule an engineer to update the moderengpu code in case flaws that are not caught.
nvcc >= 11.8
nvcc >= 12.0
shfl.up.sync
ismembermask
(doc), which is typical0xFFFFFFFF
instead of the value to be added. This bug will trigger illegal instructions after Hopper architecture and it is an incorrect usage though there are no alerts on the Volta/Ampere architecture.The code from moderngpu was introduced through https://github.com/baidu-research/warp-ctc/pull/148, and there are updates after this pull request (commits), I recommend you to schedule an engineer to update the moderengpu code in case flaws that are not caught.