It would be nice if AMREX_CUDA_ARCH could be set directly to the sm_xx architecture instead of needing to add export CUDAFLAGS="-arch=sm_80" when compiling directly to SASS (instead of SASS being compiled from PTX at runtime).
I was seeing that the first launches of kernels were slow and assumed that it was from compiling PTX into SASS, however it seems that setting export AMREX_CUDA_ARCH=8.0 already produces SASS for sm_80.
It would be nice if AMREX_CUDA_ARCH could be set directly to the sm_xx architecture instead of needing to add
export CUDAFLAGS="-arch=sm_80"
when compiling directly to SASS (instead of SASS being compiled from PTX at runtime).