Closed tijyojwad closed 3 years ago
@tijyojwad as you pointed, the max compute version that disables yield instructions (culprit for regression) is compute_60
. I just ran benchmark_cudapoa
with -arch=compute_35
and -arch=compute_60
and the performance remained the same. Although this is by no means a through benchmarking. I was wondering if supporting only architectures beyond Pascal is a serious limitation? otherwise it makes sense to switch to compute_60
.
@r-mafi , yes it's okay to update to compute_60 only. For the next release we were thinking of dropping support for gpus < pascal.
Because of the perf issue observed in cudapoa and cudaaligner, the max compute version that gives best numbers is compute 60. Update the nvcc flags for cudapoa and cudaaligner to compile to compute 60 only. Accordingly, update GW readme to only support architectures beyond Pascal.