UoB-HPC / BabelStream

STREAM, for lots of devices written in many programming models
Other
323 stars 110 forks source link

CXX_EXTRA_LINK_FLAGS are not extra #116

Closed lfmeadow closed 2 years ago

lfmeadow commented 2 years ago

CMakeLists.txt should add CXX_EXTRA_FLAGS to CXX_EXTRA_LINK_FLAGS rather than not changing CXX_EXTRA_LINK_FLAGS if it is already set. Otherwise important link-time settings get lost.

diff --git a/CMakeLists.txt b/CMakeLists.txt
index ad12dbc..a0e10a8 100644
--- a/CMakeLists.txt
+++ b/CMakeLists.txt
@@ -106,10 +106,8 @@ hint_flag(CXX_EXTRA_LIBRARIES "
 hint_flag(CXX_EXTRA_LINKER_FLAGS "
         Append to linker flags (i.e GCC's `-Wl` or equivalent)")

-# copy CXX_EXTRA_FLAGS <- CXX_EXTRA_LINK_FLAGS
-if ((DEFINED CXX_EXTRA_FLAGS) AND (NOT DEFINED CXX_EXTRA_LINK_FLAGS))
-    set(CXX_EXTRA_LINK_FLAGS ${CXX_EXTRA_FLAGS})
-endif ()
+# Honor user's CXX_EXTRA_LINK_FLAGS
+set(CXX_EXTRA_LINK_FLAGS ${CXX_EXTRA_FLAGS} ${CXX_EXTRA_LINK_FLAGS})

 # include our macros
 include(cmake/register_models.cmake)

TL;DR clang++ driver for sycl with CUDA backend defaults to running ptxas -O0 . I was losing my mind trying to figure out why I got better code out of the JIT than the AOT compiler. I needed to add -O2 to the link line but it didn't work without this patch.

FWIW here's my cmake line:

cmake -Bbuild -H. -DMODEL=sycl \
-DSYCL_COMPILER=DPCPP \
-DSYCL_COMPILER_DIR=/home/larry/sycl-with-cuda/llvm-build \
-DCXX_EXTRA_FLAGS="-fsycl-targets=nvptx64-nvidia-cuda \
-Xsycl-target-backend --cuda-gpu-arch=sm_80"
-DCXX_EXTRA_LINK_FLAGS=-O2 

Someone else will hopefully fix the clang driver but in the meantime this makes a huge difference in DOT performance (not much in the others).

tomdeakin commented 2 years ago

Thanks @lfmeadow , we've just pushed this change to main.

lfmeadow commented 2 years ago

Thanks!