NVIDIA / TorchFort

An Online Deep Learning Interface for HPC programs on NVIDIA GPUs
https://nvidia.github.io/TorchFort/
Other
154 stars 19 forks source link

generalize cmake to build for different cuda archs #7

Closed TomMelt closed 1 year ago

TomMelt commented 1 year ago

Problem

My GPU has a CUDA architecture of 89 (GTX 4080). Currently the CMakeLists.txt is only setup to handle architectures that end in 0.

Solution

I modified the string replacement to handle generic CUDA architectures.

Fixes #8

Notes

Also currently CMakeLists.txt only builds 70 and 80 by default. You will have to add 89 to TORCHFORT_CUDA_CC_LIST in the following line if you want to build for all 3 (i.e., TORCHFORT_CUDA_CC_LIST "70;80;89"). https://github.com/NVIDIA/TorchFort/blob/e06613d6feccc3d11c166f146abce7abdd85f1b3/CMakeLists.txt#L5

If you do not add 89 to TORCHFORT_CUDA_CC_LIST you will get the following error when you try to run the binary.

Accelerator Fatal Error: This file was compiled: -acc=gpu -gpu=cc70 -gpu=cc80 -acc=host or -acc=multicore
Rebuild this file with -gpu=cc89 to use NVIDIA Tesla GPU 0
romerojosh commented 1 year ago

@azrael417 This original string replacement line was written by me to convert our TORCHFORT_CUDA_CC_LIST env var to a format PyTorch expects in TORCH_CUDA_ARCH_LIST (e.g. 70;80 to 7.0 8.0). However, as pointed out here, my replacement code only works on compute capability values that end with 0, since those are what we've been using/testing.

@TomMelt Thanks for the catch! The change looks good to me. I think we will leave the default build for cc70 and cc80 for now, but users are free to set TORCHFORT_CUDA_ARCH_LIST as necessary for their systems.