Open grimoire opened 1 year ago
Hi I am a little bit confused with ARF cuda kernel.
https://github.com/ZhouYanzhao/ORN/blob/d6b38aa5e5c3ca7c6e3d0ed5770e581ee1daadcd/src/orn/lib/active_rotating_filters.cu#L19-L33
Let's say, assume thread 0 and thead 1 has:
i0 == i1 j0 == j1 k0 == k1 index0 == index1
So output[i][k][j][index]. I think that is not expected. Different threads write data to the same memory address leading to unpredictable results.
output[i][k][j][index]
Did I misunderstand the implementation?
PS: Is yzhou.work still available?
Hi I am a little bit confused with ARF cuda kernel.
https://github.com/ZhouYanzhao/ORN/blob/d6b38aa5e5c3ca7c6e3d0ed5770e581ee1daadcd/src/orn/lib/active_rotating_filters.cu#L19-L33
Let's say, assume thread 0 and thead 1 has:
i0 == i1 j0 == j1 k0 == k1 index0 == index1
So
output[i][k][j][index]
. I think that is not expected. Different threads write data to the same memory address leading to unpredictable results.Did I misunderstand the implementation?
PS: Is yzhou.work still available?