Closed nR3D closed 11 months ago
What's the speedup of you implementation?
@ChiZhangatTUM
I don't currently have a speedup to report. This PR is just laying a foundation, but every LocalDynamics
(relevant to a given test-case) will need to be adapted to run this version of the particle iterators. I have already adapted two LocalDynamics
(TimeStepInitialization
and AdvectionTimeStepSizeForImplicitViscosity
) to test particle_for
and particle_reduce
respectively. However, in order to keep running the rest of the code without SYCL, I had to copy back-and-forth all the data between CPU and GPU at every iteration, which results in a slowdown compared to a pure CPU execution. Once the entire Dambreak test-case will be adapted for SYCL execution, all the iterations inside the outermost while-loop will run exclusively on GPU.
I am currently looking for a way to loosen the need to adapt every LocalDynamics
class, in order to get some insight on speedup much sooner.
@nR3D I have tried to compile your code with clang++ compiler and sycl-ls info as
[opencl:acc:0] Intel(R) FPGA Emulation Platform for OpenCL(TM), Intel(R) FPGA Emulation Device 1.2 [2023.15.3.0.20_160000] [opencl:cpu:1] Intel(R) OpenCL, 13th Gen Intel(R) Core(TM) i7-13700KF 3.0 [2023.15.3.0.20_160000] [ext_oneapi_cuda:gpu:0] NVIDIA CUDA BACKEND, NVIDIA GeForce RTX 3090 Ti 0.0 [CUDA 12.0]
While encounter the following error of In file included from /home/chi/MyCode/GitHub/sphinxsys-nr/src/for_2D_build/common/scalar_functions_supplementary.cpp:1: In file included from /home/chi/MyCode/GitHub/sphinxsys-nr/src/shared/common/scalar_functions.h:31: In file included from /home/chi/MyCode/GitHub/sphinxsys-nr/src/shared/common/base_data_type.h:41: In file included from /home/chi/MyPrefix/vcpkg/installed/x64-linux/include/eigen3/Eigen/Core:171: /home/chi/MyPrefix/vcpkg/installed/x64-linux/include/eigen3/Eigen/src/Core/MathFunctions.h:986:13: error: no member named 'isfinite' in the global namespace; did you mean 'std::isfinite'? return (::isfinite)(x); ^~ /usr/lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/cmath:562:3: note: 'std::isfinite' declared here isfinite(float __x)
Which seems related to eigen3, any clue to this error?
@ChiZhangatTUM
No, never encountered the same error. What flags are you passing to CMake?
/usr/lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/cmath:562:3: note: 'std::isfinite' declared here isfinite(float __x)
It seems to be including from GCC-11, while my installation is pointing to GCC-9, but I don't know if that's relevant to the issue. What's the output of clang++ -v
?
@ChiZhangatTUM
No, never encountered the same error. What flags are you passing to CMake?
/usr/lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/cmath:562:3: note: 'std::isfinite' declared here isfinite(float __x)
It seems to be including from GCC-11, while my installation is pointing to GCC-9, but I don't know if that's relevant to the issue. What's the output of
clang++ -v
?
Clang -v
Intel(R) oneAPI DPC++/C++ Compiler 2023.1.0 (2023.1.0.20230320) Target: x86_64-unknown-linux-gnu Thread model: posix InstalledDir: /opt/intel/oneapi/compiler/2023.1.0/linux/bin-llvm Found candidate GCC installation: /usr/lib/gcc/x86_64-linux-gnu/11 Found candidate GCC installation: /usr/lib/gcc/x86_64-linux-gnu/12 Selected GCC installation: /usr/lib/gcc/x86_64-linux-gnu/12 Candidate multilib: .;@m64 Selected multilib: .;@m64 Found CUDA installation: /usr/lib/cuda, version 11.5
I also tried icpx compiler, same error reported.
@ChiZhangatTUM No, never encountered the same error. What flags are you passing to CMake?
/usr/lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/cmath:562:3: note: 'std::isfinite' declared here isfinite(float __x)
It seems to be including from GCC-11, while my installation is pointing to GCC-9, but I don't know if that's relevant to the issue. What's the output of
clang++ -v
?Clang -v
Intel(R) oneAPI DPC++/C++ Compiler 2023.1.0 (2023.1.0.20230320) Target: x86_64-unknown-linux-gnu Thread model: posix InstalledDir: /opt/intel/oneapi/compiler/2023.1.0/linux/bin-llvm Found candidate GCC installation: /usr/lib/gcc/x86_64-linux-gnu/11 Found candidate GCC installation: /usr/lib/gcc/x86_64-linux-gnu/12 Selected GCC installation: /usr/lib/gcc/x86_64-linux-gnu/12 Candidate multilib: .;@m64 Selected multilib: .;@m64 Found CUDA installation: /usr/lib/cuda, version 11.5
I also tried icpx compiler, same error reported.
cxx flags
if(SPHINXSYS_USE_SYCL) message("-- SPHinXsys use SYCL") target_compile_options(sphinxsys_core INTERFACE $<$<BOOL:${SPHINXSYS_USE_SYCL}>:-fsycl -fsycl-targets=nvptx64-nvidia-cuda>) endif()
@ChiZhangatTUM No, never encountered the same error. What flags are you passing to CMake?
/usr/lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/cmath:562:3: note: 'std::isfinite' declared here isfinite(float __x)
It seems to be including from GCC-11, while my installation is pointing to GCC-9, but I don't know if that's relevant to the issue. What's the output of
clang++ -v
?Clang -v Intel(R) oneAPI DPC++/C++ Compiler 2023.1.0 (2023.1.0.20230320) Target: x86_64-unknown-linux-gnu Thread model: posix InstalledDir: /opt/intel/oneapi/compiler/2023.1.0/linux/bin-llvm Found candidate GCC installation: /usr/lib/gcc/x86_64-linux-gnu/11 Found candidate GCC installation: /usr/lib/gcc/x86_64-linux-gnu/12 Selected GCC installation: /usr/lib/gcc/x86_64-linux-gnu/12 Candidate multilib: .;@m64 Selected multilib: .;@m64 Found CUDA installation: /usr/lib/cuda, version 11.5 I also tried icpx compiler, same error reported.
cxx flags
if(SPHINXSYS_USE_SYCL) message("-- SPHinXsys use SYCL") target_compile_options(sphinxsys_core INTERFACE <{SPHINXSYS_USE_SYCL}:-fsycl -fsycl-targets=nvptx64-nvidia-cuda>) endif()
The following code where the error reported of eigen is
template
return (::isinf)(x);
using std::isinf;
return isinf EIGEN_NOT_A_MACRO (x);
return x>NumTraits<T>::highest() || x<NumTraits<T>::lowest();
}
@ChiZhangatTUM No, never encountered the same error. What flags are you passing to CMake?
/usr/lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/cmath:562:3: note: 'std::isfinite' declared here isfinite(float __x)
It seems to be including from GCC-11, while my installation is pointing to GCC-9, but I don't know if that's relevant to the issue. What's the output of
clang++ -v
?Clang -v Intel(R) oneAPI DPC++/C++ Compiler 2023.1.0 (2023.1.0.20230320) Target: x86_64-unknown-linux-gnu Thread model: posix InstalledDir: /opt/intel/oneapi/compiler/2023.1.0/linux/bin-llvm Found candidate GCC installation: /usr/lib/gcc/x86_64-linux-gnu/11 Found candidate GCC installation: /usr/lib/gcc/x86_64-linux-gnu/12 Selected GCC installation: /usr/lib/gcc/x86_64-linux-gnu/12 Candidate multilib: .;@m64 Selected multilib: .;@m64 Found CUDA installation: /usr/lib/cuda, version 11.5 I also tried icpx compiler, same error reported.
I just replicated the error on the same DPC++ version (2023.1), while the version I was using (built from git source) is not presenting the same problem
The following code where the error reported of eigen is template EIGEN_DEVICE_FUNC typename internal::enable_if<(!internal::is_integral::value)&&(!NumTraits::IsComplex),bool>::type isinf_impl(const T& x) { #if defined(EIGEN_GPU_COMPILE_PHASE) return (::isinf)(x); #elif EIGEN_USE_STD_FPCLASSIFY using std::isinf; return isinf EIGEN_NOT_A_MACRO (x); #else return x>NumTraits::highest() || x<NumTraits::lowest(); #endif }
Seems to be the EIGEN_GPU_COMPILE_PHASE
flag, strange enough that the DPC++ version solves the problem, I'll check if they added a bugfix lately. Eigen is probably using a global namespace because it's expecting to be compiled on nvcc or some other gpu compiler, but SYCL is not compiling that part for GPU, hence the error I suppose.
I solve this error by adding
_#include
I solve this error by adding _#include
before #include in base_data_type. while the compilation failed due to : undefined reference to `sycl::_V1::queue::submit_impl(std::function<void (sycl::_V1::handler&)>, sycl::_V1::detail::code_location const&)'
Yes, math.h
solves the problem, but I don't have the same reference error, on my side it's compiling and executing fine after adding that include. Yours seems to be a linkage problem to the sycl library.
if(SPHINXSYS_USE_SYCL) message("-- SPHinXsys use SYCL") target_compile_options(sphinxsys_core INTERFACE {SPHINXSYS_USE_SYCL}:-fsycl -fsycl-targets=nvptx64-nvidia-cuda>) endif()
Are you sure those flags are getting set properly?
I solve this error by adding _#include
before #include in base_data_type. while the compilation failed due to : undefined reference to `sycl::_V1::queue::submit_impl(std::function<void (sycl::_V1::handler&)>, sycl::_V1::detail::code_location const&)' Yes,
math.h
solves the problem, but I don't have the same reference error, on my side it's compiling and executing fine after adding that include. Yours seems to be a linkage problem to the sycl library.if(SPHINXSYS_USE_SYCL) message("-- SPHinXsys use SYCL") target_compile_options(sphinxsys_core INTERFACE {SPHINXSYS_USE_SYCL}:-fsycl -fsycl-targets=nvptx64-nvidia-cuda>) endif()
Are you sure those flags are getting set properly?
Seems that with target_compile_options, the flags can not be set properly. move after cmake command works fine. I will have a check later.
I solve this error by adding _#include
before #include in base_data_type. while the compilation failed due to : undefined reference to `sycl::_V1::queue::submit_impl(std::function<void (sycl::_V1::handler&)>, sycl::_V1::detail::code_location const&)' Yes,
math.h
solves the problem, but I don't have the same reference error, on my side it's compiling and executing fine after adding that include. Yours seems to be a linkage problem to the sycl library.if(SPHINXSYS_USE_SYCL) message("-- SPHinXsys use SYCL") target_compile_options(sphinxsys_core INTERFACE {SPHINXSYS_USE_SYCL}:-fsycl -fsycl-targets=nvptx64-nvidia-cuda>) endif()
Are you sure those flags are getting set properly?
Seems that with target_compile_options, the flags can not be set properly. move after cmake command works fine. I will have a check later.
Modified as follows works fine:
if(SPHINXSYS_USE_SYCL) message("-- SPHinXsys use SYCL") target_compile_options(sphinxsys_core INTERFACE $<$<BOOL:${SPHINXSYS_USE_SYCL}>:-fsycl -fsycl-targets=nvidia_gpu_sm_86>) target_link_options(sphinxsys_core INTERFACE -fsycl -fsycl-targets=nvidia_gpu_sm_86) endif()
I can add a new commit to include those targets inside CMakeLists, for example:
if(SPHINXSYS_SYCL_TARGETS)
message("-- SPHinXsys use SYCL")
target_compile_options(sphinxsys_core INTERFACE -fsycl -fsycl-targets=${SPHINXSYS_SYCL_TARGETS})
target_link_options(sphinxsys_core INTERFACE -fsycl -fsycl-targets=${SPHINXSYS_SYCL_TARGETS})
endif()
Is it better to have a single SPHINXSYS_SYCL_TARGETS
variable, or two separate SPHINXSYS_USE_SYCL
and SPHINXSYS_SYCL_TARGETS
?
I can add a new commit to include those targets inside CMakeLists, for example:
if(SPHINXSYS_SYCL_TARGETS) message("-- SPHinXsys use SYCL") target_compile_options(sphinxsys_core INTERFACE -fsycl -fsycl-targets=${SPHINXSYS_SYCL_TARGETS}) target_link_options(sphinxsys_core INTERFACE -fsycl -fsycl-targets=${SPHINXSYS_SYCL_TARGETS}) endif()
Is it better to have a single
SPHINXSYS_SYCL_TARGETS
variable, or two separateSPHINXSYS_USE_SYCL
andSPHINXSYS_SYCL_TARGETS
?
Two variables would be better. If SPHINXSYS_SYCL_TARGETS is not set, use default one, for example nvptx64-nvidia-cuda
I can add a new commit to include those targets inside CMakeLists, for example:
if(SPHINXSYS_SYCL_TARGETS) message("-- SPHinXsys use SYCL") target_compile_options(sphinxsys_core INTERFACE -fsycl -fsycl-targets=${SPHINXSYS_SYCL_TARGETS}) target_link_options(sphinxsys_core INTERFACE -fsycl -fsycl-targets=${SPHINXSYS_SYCL_TARGETS}) endif()
Is it better to have a single
SPHINXSYS_SYCL_TARGETS
variable, or two separateSPHINXSYS_USE_SYCL
andSPHINXSYS_SYCL_TARGETS
?Two variables would be better. If SPHINXSYS_SYCL_TARGETS is not set, use default one, for example nvptx64-nvidia-cuda
@FabienPean-Virtonomy Hi Fabien, I am a little confusing by adding link_option here, would you please review this commit?
@FabienPean-Virtonomy Great review Fabien, I would like to go through all the conservation you and Alberto have. Thanks
I merge this branch documentation a progress.
Problem addressed
In order to run particle iterators in parallel using SYCL some limitations need to be addressed:
local_dynamics_function
needs to be device copyable; calls to virtual methods are also not supported by SYCLlocal_dynamics_function
; which means that the object parameterthis
cannot be captured and used as reference, e.g.[&](size_t i){ this->update(i); }
sycl::buffer
s, passing the corresponding context handlercgh
Solution implemented so far
Each of these problems have been (partially) solved as follows:
sycl::accessor
s to the relevant data members of its corresponding base class (point 3. discusses how those accessors are initialized) or objects of other kernel classes. This ensures that every kernel class is device copyable and does not relay on virtual functions. Lastly, kernel classes need to implement those methods that substitute their base class methods. To avoid code duplication, a static method containing the function logic can be implemented (using a template to pass different types of arguments based on which class is calling it) and use this static method to implement both the kernel and base classesthis
inlocal_dynamics_function
whenever particle iterators are called with the ExecutionPolicy corresponding to SYCL; but at the same timethis
needs to be kept for every other ExecutionPolicy. To achieve this, without having to change every singleLocalDynamics
with a new interface or fork code execution with anif/else
, a classExecutionProxy
has been introduced. It decides at compile time (based on the passedExecutionPolicy
) which object to pass to thelocal_dynamics_function
(that will now have one more parameter in its signature). Proxy corresponding to the object is passed to the particle iterator. It will then use the passed proxy to get the right object (kernel/base class) and eventually initialize the kernel class accessors.DeviceVariable
for every data member to be used inside the device (which will manage their correspondingsycl::buffer
), and aDeviceProxy
, which implementsExecutionProxy
discussed in point 2. Particle iterators pass the sycl context to the givenDeviceProxy
, which propagates it to everyDeviceVariable
. ThoseDeviceVariable
s will return thesycl::accessor
s that will then be passed to the kernel class.This solution is obviously unnecessarily cumbersome and needs to be restructured, but it ensures a working execution on SYCL devices, without having to refactor any unrelated class inside the code-base, that may otherwise be conflicting with the changes needed for other possible solutions.
Run code
Only two
LocalDynamics
classes are implemented with the corresponding kernel classes as of now. And only Dambreak is considered, so theCMakeLists.txt
has been modified to only import that test case.In other to use SYCL the following CMake flags need to be passed
depending on where your DPC++ and CUDA installation is (aside from the other flags needed by SPHinXsys). Some other time I will add custom flags for SYCL targets and CUDA path, instead of using
CXX_FLAGS
.In addition, the path
~/dpcpp/build-release/lib
needs to be added to the environment variableLD_LIBRARY_PATH
, in case the providedCXX_COMPILER
is located in a custom location