berenger-eu / tbfmm

Task-based fast multipole method, parallelized using OpenMP and StarPU. With StarPU it supports multiple GPUs (CUDA).
Other
9 stars 3 forks source link

Parallel threads count #10

Open ArturSalamatin opened 1 year ago

ArturSalamatin commented 1 year ago

I have investigated a bit into the issue with the number of threads, https://github.com/berenger-eu/tbfmm/issues/7#issuecomment-1331288710

What I have found is that in the code https://github.com/berenger-eu/tbfmm/blob/5824e3ccad9abf821a12dcacb45b74defd65b73d/examples/testRotationKernel.cpp#L142 the method algorithm->GetNbThreads() is called. And its definition https://github.com/berenger-eu/tbfmm/blob/5824e3ccad9abf821a12dcacb45b74defd65b73d/src/algorithms/sequential/tbfalgorithm.hpp#L242-L244

is taken from TbfAlgorithm https://github.com/berenger-eu/tbfmm/blob/5824e3ccad9abf821a12dcacb45b74defd65b73d/src/algorithms/sequential/tbfalgorithm.hpp#L13-L14 which is a sequential one.

Further investigation reveals that a sequential algorithm is selected by TbfAlgorithmSelecter because TBF_USE_OPENMP variable is not defined in the current configuration https://github.com/berenger-eu/tbfmm/blob/5824e3ccad9abf821a12dcacb45b74defd65b73d/src/algorithms/tbfalgorithmselecter.hpp#L6-L15

While in the Reade.md you state that https://github.com/berenger-eu/tbfmm/blob/5824e3ccad9abf821a12dcacb45b74defd65b73d/README.md?plain=1#L1222-L1241 unfortunately, I have no idea how and where the variable is set.

berenger-eu commented 1 year ago

Indeed, I just pushed a change, which should correct this. You should see if when running cmake:

-- CMAKE_CXX_FLAGS =  -m64 -march=native -Wextra -Wnon-virtual-dtor -Wshadow -Wpointer-arith -Wcast-qual -Wconversion  -Wall -Wno-sign-conversion -pedantic -Woverloaded-virtual -Wpointer-arith -Wcast-qual -Wconversion -Wno-error -std=c++17  -funroll-loops -Wold-style-cast  -mavx2 -march=native -mtune=native -mavx -march=native -mtune=native -msse4 -msse4.2 -march=native -mtune=native -msse4 -msse4.1 -march=native -mtune=native -mssse3 -march=native -mtune=native -msse3 -march=native -mtune=native -Wold-style-cast -fopenmp -DTBF_USE_SPETABARU -DTBF_USE_INASTEMP -DTBF_USE_OPENMP -DTBF_USE_FFTW

Note the -DTBF_USE_OPENMP

ArturSalamatin commented 1 year ago

Indeed.. Now I see that the library is being compiled with openMP enabled... since there are compilation errors =]] Here is the output with typical errors, warnings, and messages:

[build] Starting build
[proc] Executing command: C:\CMake\bin\cmake.EXE --build d:/Lessons/MyPrograms/VisualStudio/C++/TBFMM/tbfmm/build --config Debug --target testRotationKernel -j 18 --
[build] Microsoft (R) Build Engine version 17.1.0+ae57d105c for .NET Framework
[build] Copyright (C) Microsoft Corporation. All rights reserved.
[build] 
[build]   Checking Build System
[build]   Building Custom Rule D:/Lessons/MyPrograms/VisualStudio/C++/TBFMM/tbfmm/CMakeLists.txt
[build]   testRotationKernel.cpp
[build] D:\Lessons\MyPrograms\VisualStudio\C++\TBFMM\tbfmm\examples\testRotationKernel.cpp : warning C4005: '_OPENMP': macro redefinition [D:\Lessons\MyPrograms\VisualStudio\C++\TBFMM\tbfmm\build\testRotationKernel.vcxproj]
[build] D:\Lessons\MyPrograms\VisualStudio\C++\TBFMM\tbfmm\examples\testRotationKernel.cpp : message : see previous definition of '_OPENMP' [D:\Lessons\MyPrograms\VisualStudio\C++\TBFMM\tbfmm\build\testRotationKernel.vcxproj]
[build] D:\Lessons\MyPrograms\VisualStudio\C++\TBFMM\tbfmm\src\algorithms/openmp/tbfopenmpalgorithm.hpp(337,1): warning C4267: 'initializing': conversion from 'size_t' to 'long', possible loss of data [D:\Lessons\MyPrograms\VisualStudio\C++\TBFMM\tbfmm\build\testRotationKernel.vcxproj]
[build] D:\Lessons\MyPrograms\VisualStudio\C++\TBFMM\tbfmm\src\algorithms/openmp/tbfopenmpalgorithm.hpp(109,9): error C3005: 'depend': unexpected token encountered on OpenMP 'task' directive [D:\Lessons\MyPrograms\VisualStudio\C++\TBFMM\tbfmm\build\testRotationKernel.vcxproj]
[D:\Lessons\MyPrograms\VisualStudio\C++\TBFMM\tbfmm\build\testRotationKernel.vcxproj]
[proc] The command: C:\CMake\bin\cmake.EXE --build d:/Lessons/MyPrograms/VisualStudio/C++/TBFMM/tbfmm/build --config Debug --target testRotationKernel -j 18 -- exited with code: 1 and signal: null
[build] Build finished with exit code 1
[main] Failed to prepare executable target with name "undefined"

My hypothesis is that MSVC does not support the pragma directive #pragma omp task depend

  1. Do you have any comments?
  2. What should I do to disable openMP and use a sequential algorithm?
ArturSalamatin commented 1 year ago

Seems that there is no task depend in OpenMP 3.0, https://www.openmp.org/wp-content/uploads/OpenMP3.1.pdf and even 3.1 https://www.openmp.org/wp-content/uploads/OpenMP3.1.pdf but it is introduced in OpenMP 4.0 (section 2.11.1 on p. 113) https://www.openmp.org/wp-content/uploads/OpenMP4.0.0.pdf

while on April 26th, 2022 Microsoft was just on its way to support the remaining features added in the OpenMP 3.1 standard as stated in their blogpost (see Our OpenMP plans section) https://devblogs.microsoft.com/cppblog/openmp-task-support-for-c-in-visual-studio/

  1. Is it possible to downgrade the pragma directive according to standard 3.1 or 3.0?
  2. Or should I just restrict myself to a sequential algorithm and try the spetabaru/inastemp libs?
berenger-eu commented 1 year ago

Wow, MS has been slow on this. I would suggest to use the sequential, this is enough to validate your kernel. Once it works, you could try spetabaru, now called specx (which is pure standard C++ but I never tried to compile it on windows...)