a return before a barrier that happens only in some threads in a workgroup leads to UB.
While the old code actually works on some devices, it fails on some others (i.e. "smaller" GPUs).
BTW, I think it would be better to set specialization constants when the graph is built,
in that way the local workgroup could be sized appropriately.
But it would take a lot of work.
a return before a barrier that happens only in some threads in a workgroup leads to UB. While the old code actually works on some devices, it fails on some others (i.e. "smaller" GPUs).
BTW, I think it would be better to set specialization constants when the graph is built, in that way the local workgroup could be sized appropriately. But it would take a lot of work.