compile common cuda code for multiple targets

LeelaChessZero / lc0

The rewritten engine, originally for tensorflow. Now all other backends have been ported here.

GNU General Public License v3.0

2.41k stars 526 forks source link

compile common cuda code for multiple targets #2015

Closed borg323 closed 5 months ago

borg323 commented 5 months ago

Use either -arch=all-major or for cuda versions < 11.5 that don't have this the equivalent nvcc options ~~(but limit f16 code to architectures that support it)~~. The added complexity to limit fp16 code to newer architectures gave an insignificant build time advantage, so was removed. Seems to fix the reported cuda NaN issues.