move to cuda 11.8 with install script

Turns out the following analysis is not correct - the latest version seems to work after compiling both common and fp16 only cuda code with -arch=all-major.

Made draft as #2015 has the bits to fix the NaN issues (a bit more refined) without the cuda version update. Will make this into just a cuda version update PR at a later stage.

The recent rc1 issues seem to go away if we compile the cuda fp16 code with -arch=all-major, so I added it to meson.build. It is added unconditionally since the default is still to use -arch=native and the alternative is an attempt to do the equivalent to -arch=all-major for cuda versions that don't support it. ~~This requires at least cuda version 11.5, but as we tested with 11.8 I used this~~ Updates cuda to 11.8 for the appveyor cuda builds (cudnn unchanged), and given the huge size of the dlls I added an install script based on the directml one. While testing I also found a bug with the directml install script, probably some recent windows security change makes executables in the same directory unavailable when running it by double clicking, so I removed the lc0.exe check and it will directly install the dlls in the script's directory.

LeelaChessZero / lc0

move to cuda 11.8 with install script #2009