berenger-eu / tbfmm

Task-based fast multipole method, parallelized using OpenMP and StarPU. With StarPU it supports multiple GPUs (CUDA).
Other
9 stars 3 forks source link

testRotationKernel segmentation fault #7

Open ArturSalamatin opened 1 year ago

ArturSalamatin commented 1 year ago

When running the testRotationKernel in debug mode under VS Code a segmentation fault error occurs in void M2M(...) method with the following error in FRotationKernel.hpp file

Thread 15 received signal SIGSEGV, Segmentation fault.
[Switching to Thread 16308.0x382c]
0x00007ff7d659726c in FRotationKernel<double, 12, TbfMortonSpaceIndex<3l, TbfSpacialConfiguration<double, 3l>, false> >::M2M<TbfCellsContainer<double, std::array<std::complex<double>, 91ull>, std::array<std::complex<double>, 91ull>, TbfMortonSpaceIndex<3l, TbfSpacialConfiguration<double, 3l>, false> >::CellHeader, std::vector<std::reference_wrapper<std::array<std::complex<double>, 91ull> const>, std::allocator<std::reference_wrapper<std::array<std::complex<double>, 91ull> const> > >, std::array<std::complex<double>, 91ull> > (this=0x2186d4f9018, inLevel=1, inLowerCell=std::vector of length 8, capacity 8 = {...}, inOutUpperCell=..., childrenPos=0x372d7ff5e0, inNbChildren=8) at ...TBFMM/tbfmm/src/kernels/rotationkernel/FRotationKernel.hpp:1029
1029                            w_lm_real += coef[index_l_minus_j] * source_w[index_jm].real();
ArturSalamatin commented 1 year ago

The library is compiled under Windows 10 using OpenMP as a parallelization algorithm. Further tests reveal that the error may be thrown at any of the X2Y methods.

berenger-eu commented 1 year ago

Thanks a lot for the feedback. I will try to find a Windows PC to test it (need a week I think) and come back to you.

ArturSalamatin commented 1 year ago

Once I have compiled the code with MSVC and OpenMP (as well as spetabaru and inastemp) disabled, I was able to localize an issue with std::array bounds in code https://github.com/berenger-eu/tbfmm/blob/202ecd0d4c299ef7e1f8619067b78e01bdf47b11/src/core/tbftree.hpp#L426

It seems that rhs is a std::vector of 1000 elements, which is correct. Its elements are of type std::array<RhsType, NbRhsValuesPerParticle>, where NbRhsValuesPerParticle = 4. So, every element is an array of 4 doubles. At idxPart = 0 the operator particleIndexes[idxPart] returns 278 which is greater than NbRhsValuesPerParticle - 1 = 3.

This triggers an error.

berenger-eu commented 1 year ago

Ho yes thanks, there was already an error like this in the past, I thought I did fix all of them, but it seems no. I will work on it now.

berenger-eu commented 1 year ago

I pushed a commit that should solve the problem.

ArturSalamatin commented 1 year ago

It looks like the testRotaionKernel example executes properly now. Thank you!

My minor concern now is whether it is indeed possible to enable OpenMP with MSVC or not..

My configure/run steps are as follows:

  1. Delete everything in build folder (I do it because typically upon changing CMake configuration, the files in the folder are not rewritten - a known issue?)
  2. Call CMake: Configure
  3. Get the following output with fatal error C1021: invalid preprocessor command 'warning' and [cmake] -- OpenMP enabled with -openmp:llvm -openmp:experimental
    [main] Configuring folder: tbfmm 
    [proc] Executing command: C:\CMake\bin\cmake.EXE --no-warn-unused-cli -DCMAKE_EXPORT_COMPILE_COMMANDS:BOOL=TRUE -Sd:/Lessons/MyPrograms/VisualStudio/C++/TBFMM/tbfmm -Bd:/Lessons/MyPrograms/VisualStudio/C++/TBFMM/tbfmm/build -G "Visual Studio 17 2022" -T host=x64 -A x64
    [cmake] Not searching for unused variables given on the command line.
    [cmake] -- Selecting Windows SDK version 10.0.19041.0 to target Windows 10.0.19044.
    [cmake] -- The CXX compiler identification is MSVC 19.31.31105.0
    [cmake] -- Detecting CXX compiler ABI info
    [cmake] -- Detecting CXX compiler ABI info - done
    [cmake] -- Check for working CXX compiler: C:/Program Files/Microsoft Visual Studio/2022/Community/VC/Tools/MSVC/14.31.31103/bin/Hostx64/x64/cl.exe - skipped
    [cmake] -- Detecting CXX compile features
    [cmake] -- Detecting CXX compile features - done
    [cmake] -- Cannot compile C++17, output when compiling simple example is :
    [cmake] -- Change Dir: D:/Lessons/MyPrograms/VisualStudio/C++/TBFMM/tbfmm/build/CMakeFiles/CMakeTmp
    [cmake] 
    [cmake] Run Build Command(s):C:/Program Files/Microsoft Visual Studio/2022/Community/MSBuild/Current/Bin/amd64/MSBuild.exe cmTC_011f5.vcxproj /p:Configuration=Debug /p:Platform=x64 /p:VisualStudioVersion=17.0 /v:m && Microsoft (R) Build Engine version 17.1.0+ae57d105c for .NET Framework
    [cmake] Copyright (C) Microsoft Corporation. All rights reserved.
    [cmake] 
    [cmake]   Microsoft (R) C/C++ Optimizing Compiler Version 19.31.31105 for x64
    [cmake]   cppversion.cpp
    [cmake]   Copyright (C) Microsoft Corporation.  All rights reserved.
    [cmake]   cl /c /I"C:\dev\vcpkg\installed\x64-windows\include" /Zi /W3 /WX- /diagnostics:column /Od /Ob0 /D _MBCS /D WIN32 /D _WINDOWS /D "CMAKE_INTDIR=\"Debug\"" /Gm- /EHsc /RTC1 /MDd /GS /fp:precise /Zc:wchar_t /Zc:forScope /Zc:inline /GR /std:c++17 /Fo"cmTC_011f5.dir\Debug\\" /Fd"cmTC_011f5.dir\Debug\vc143.pdb" /external:W3 /Gd /TP /errorReport:queue D:\Lessons\MyPrograms\VisualStudio\C++\TBFMM\tbfmm\deps\CMakeModules\cppversion.cpp
    [cmake] D:\Lessons\MyPrograms\VisualStudio\C++\TBFMM\tbfmm\deps\CMakeModules\cppversion.cpp(9,1): fatal error C1021: invalid preprocessor command 'warning' [D:\Lessons\MyPrograms\VisualStudio\C++\TBFMM\tbfmm\build\CMakeFiles\CMakeTmp\cmTC_011f5.vcxproj]
    [cmake] 
    [cmake] 
    [cmake] -- Will continue anyway...
    [cmake] -- SPETABARU Cannot be found in D:/Lessons/MyPrograms/VisualStudio/C++/TBFMM/tbfmm/deps/spetabaru (please use git submodule init && git submodule update)
    [cmake] -- Inastemp Cannot be found in D:/Lessons/MyPrograms/VisualStudio/C++/TBFMM/tbfmm/deps/inastemp (please use git submodule init && git submodule update)
    [cmake] -- MSVC detected
    [cmake] -- OpenMP enabled with  -openmp:llvm -openmp:experimental
    [cmake] -- Consider FFTW_ROOT = 
    [cmake] -- Could NOT find FFTW (missing: FFTW_LIBRARIES FFTW_INCLUDES) 
    [cmake] -- FFTW Cannot be found, try by setting -DFFTW_ROOT=... or env FFTW_ROOT
    [cmake] -- Available compilation keys are: 
    [cmake] -- CMAKE_CXX_FLAGS = /DWIN32 /D_WINDOWS /W3 /GR /EHsc  -openmp:llvm -openmp:experimental
    [cmake] -- Add example exampleEmptyKernel
    [cmake] -- Add example testRandomParticles
    [cmake] -- Add example testRandomParticlesPeriodic
    [cmake] -- Add example testRandomParticlesTsm
    [cmake] -- Add example testRotationKernel
    [cmake] -- Examples -- testUnifKernel needs FFTW
    [cmake] -- Examples -- testUnifKernel cannot be compiled due to missing libs (D:/Lessons/MyPrograms/VisualStudio/C++/TBFMM/tbfmm/examples/testUnifKernel.cpp)
    [cmake] -- Configuring done
    [cmake] -- Generating done
    [cmake] -- Build files have been written to: D:/Lessons/MyPrograms/VisualStudio/C++/TBFMM/tbfmm/build
    [visual-studio] Patch Windows SDK path from C:\Program Files (x86)\Windows Kits\10\bin\x64 to C:\Program Files (x86)\Windows Kits\10\bin\10.0.19041.0\x64 for C:\Program Files\Microsoft Visual Studio\2022\Community\VC\Auxiliary\Build\vcvarsall.bat
  4. And running the testRotationKernel target I get the following output with apparently a single parallel thread, as is stated in the output Number of threads 1
    
    TbfSpacialConfiguration @ 0000000665D3ECF0
    - Dim 3
    - treeHeight 4
    - box center std::class std::array<double,3> @ 0000000665D3D510 - Size 3 - Data { 0.5,0.5,0.5}     
    - box widths std::class std::array<double,3> @ 0000000665D3D518 - Size 3 - Data { 1,1,1}

Particles info

Build the tree in 0.0054059s Number of elements per group 13 Algorithm name TbfAlgorithm Number of threads 1
Execute in 1.14553s Direct execute in 0.0298925s Relative differences:

Is this a proper output for this example? Should it use more than a single thread with OpenMP enabled?

Thanks a lot for your efforts and prompt responses!

berenger-eu commented 1 year ago

Yes that seems fine. It is surprising that it does not use more threads. You can set the env variable OMP_NUM_THREADS=4 to see if that changes something, but by default openmp should use all cores.