CMakeLists.txt should add CXX_EXTRA_FLAGS to CXX_EXTRA_LINK_FLAGS rather than not changing CXX_EXTRA_LINK_FLAGS if it is already set.
Otherwise important link-time settings get lost.
diff --git a/CMakeLists.txt b/CMakeLists.txt
index ad12dbc..a0e10a8 100644
--- a/CMakeLists.txt
+++ b/CMakeLists.txt
@@ -106,10 +106,8 @@ hint_flag(CXX_EXTRA_LIBRARIES "
hint_flag(CXX_EXTRA_LINKER_FLAGS "
Append to linker flags (i.e GCC's `-Wl` or equivalent)")
-# copy CXX_EXTRA_FLAGS <- CXX_EXTRA_LINK_FLAGS
-if ((DEFINED CXX_EXTRA_FLAGS) AND (NOT DEFINED CXX_EXTRA_LINK_FLAGS))
- set(CXX_EXTRA_LINK_FLAGS ${CXX_EXTRA_FLAGS})
-endif ()
+# Honor user's CXX_EXTRA_LINK_FLAGS
+set(CXX_EXTRA_LINK_FLAGS ${CXX_EXTRA_FLAGS} ${CXX_EXTRA_LINK_FLAGS})
# include our macros
include(cmake/register_models.cmake)
TL;DR
clang++ driver for sycl with CUDA backend defaults to running ptxas -O0 . I was losing my mind trying to figure out why I got better code out of the JIT than the AOT compiler.
I needed to add -O2 to the link line but it didn't work without this patch.
CMakeLists.txt should add CXX_EXTRA_FLAGS to CXX_EXTRA_LINK_FLAGS rather than not changing CXX_EXTRA_LINK_FLAGS if it is already set. Otherwise important link-time settings get lost.
TL;DR clang++ driver for sycl with CUDA backend defaults to running ptxas -O0 . I was losing my mind trying to figure out why I got better code out of the JIT than the AOT compiler. I needed to add -O2 to the link line but it didn't work without this patch.
FWIW here's my cmake line:
Someone else will hopefully fix the clang driver but in the meantime this makes a huge difference in DOT performance (not much in the others).