adamantine-sim / adamantine

Software to simulate heat transfer for additive manufacturing
https://adamantine-sim.github.io/adamantine/
Other
36 stars 10 forks source link

Segfault using multithreading #130

Open Rombur opened 2 years ago

Rombur commented 2 years ago

We have a segfault in the code in the code using multithreading. I don't know what triggers it but it seems to happen in cell_local_apply. Some input files will trigger the segfault using a single MPI rank while others fail when using multiple MPI ranks (./adamantine -i demo_316_short.info passes but mpirun -np 2 ./adamantine -i demo_316_short.info fails). Here is a backtrace using clang-sanitizer address. Using DEAL_II_NUM_THREADS=1 fixes the problem. Since we are using MPI for parallelization on the host, I propose that we always set DEAL_II_NUM_THREADS=1 in the Docker image and update the README. Multithreading in deal.II is done using TBB but it will be removed in the future. Once TBB has been replaced we can revisit this.

AddressSanitizer:DEADLYSIGNAL                                                                                                                                                                                                                =================================================================                                                                                                 
==1773==ERROR: AddressSanitizer: SEGV on unknown address 0x6248000108d8 (pc 0x0000014ddeec bp 0x7fdd74eb2830 sp 0x7fdd74eb2750 T89)                                                                                                          ==1773==The signal is caused by a READ memory access.                               
AddressSanitizer:DEADLYSIGNAL                                                                                                                                    
AddressSanitizer:DEADLYSIGNAL                                                                                                                                                                                             
AddressSanitizer:DEADLYSIGNAL                                                                                                                                    
    #0 0x14ddeec in adamantine::ScanPath::update_current_segment_info(double, dealii::Point<3, double>&, double&) const /home/dev/adamantine/build/../source/ScanPath.cc:159:62
    #1 0x14deccf in adamantine::ScanPath::value(double const&) const /home/dev/adamantine/build/../source/ScanPath.cc:184:3                        
    #2 0x1cf7317 in adamantine::GoldakHeatSource<3>::value(dealii::Point<3, double> const&, double, double) const /home/dev/adamantine/build/../source/GoldakHeatSource.cc:34:59
    #3 0x164e7f0 in adamantine::ThermalOperator<3, 3, dealii::MemorySpace::Host>::cell_local_apply(dealii::MatrixFree<3, double, dealii::VectorizedArray<double, 2ul> > const&, dealii::LinearAlgebra::distributed::Vector<double, dealii::Me
morySpace::Host>&, dealii::LinearAlgebra::distributed::Vector<double, dealii::MemorySpace::Host> const&, std::pair<unsigned int, unsigned int> const&) const /home/dev/adamantine/build/../source/ThermalOperator.cc:445:21
    #4 0x1acbb2f in dealii::internal::MFWorker<dealii::MatrixFree<3, double, dealii::VectorizedArray<double, 2ul> >, dealii::LinearAlgebra::distributed::Vector<double, dealii::MemorySpace::Host>, dealii::LinearAlgebra::distributed::Vecto
r<double, dealii::MemorySpace::Host>, adamantine::ThermalOperator<3, 3, dealii::MemorySpace::Host>, true>::cell(std::pair<unsigned int, unsigned int> const&) /opt/dealii/include/deal.II/matrix_free/matrix_free.h:4649:13
    #5 0x7fde199139a5 in dealii::internal::MatrixFreeFunctions::color::CellWork::operator()(tbb::blocked_range<unsigned int> const&) const /home/dev/dealii/build/../source/matrix_free/task_info.cc:244:18
    #6 0x7fde19913899 in tbb::interface9::internal::start_for<tbb::blocked_range<unsigned int>, dealii::internal::MatrixFreeFunctions::color::CellWork, tbb::auto_partitioner const>::run_body(tbb::blocked_range<unsigned int>&) /home/dev/d
ealii/build/../bundled/tbb-2018_U2/include/tbb/parallel_for.h:116:13