It seems that simply writing to the chunk queues when cleaning them for re-use does not work properly on the newer architecture, even threadfences did not solve the issue.
Same issue is not present (it seems) on the TITAN V and 1080Ti.
Problem is solved by switching from writing uint4s to either a stg_cg or an atomicExch on CC7.5.
It seems that simply writing to the chunk queues when cleaning them for re-use does not work properly on the newer architecture, even threadfences did not solve the issue. Same issue is not present (it seems) on the TITAN V and 1080Ti. Problem is solved by switching from writing uint4s to either a stg_cg or an atomicExch on CC7.5.