Closed tjhei closed 6 years ago
I hit an assertion in sol_cx and other places with deal.II 9.0, might be related, but is slightly different:
--------------------------------------------------------
An error occurred in line <7264> of file </home/rengas/Software/dealii/include/deal.II/numerics/vector_tools.templates.h> in function
void dealii::VectorTools::internal::do_integrate_difference(const dealii::hp::MappingCollection<dim, spacedim>&, const DoFHandlerType&, const InVector&, const dealii::Function<spacedim>&, OutVector&, const dealii::hp::QCollection<dim>&, const dealii::VectorTools::NormType&, const dealii::Function<spacedim>*, double) [with int dim = 2; InVector = dealii::TrilinosWrappers::MPI::BlockVector; OutVector = dealii::Vector<float>; DoFHandlerType = dealii::DoFHandler<2, 2>; int spacedim = 2]
The violated condition was:
exact_solution.n_components==n_components
Additional information:
Dimension 1 not equal to 4.
Stacktrace:
-----------
#0 /home/rengas/Software/deal.II-dev/lib/libdeal_II.g.so.9.0.0-rc0:
#1 /home/rengas/Software/deal.II-dev/lib/libdeal_II.g.so.9.0.0-rc0: void dealii::VectorTools::integrate_difference<2, dealii::TrilinosWrappers::MPI::BlockVector, dealii::Vector<float>, 2>(dealii::Mapping<2, 2> const&, dealii::DoFHandler<2, 2> const&, dealii::TrilinosWrappers::MPI::BlockVector const&, dealii::Function<2, double> const&, dealii::Vector<float>&, dealii::Quadrature<2> const&, dealii::VectorTools::NormType const&, dealii::Function<2, double> const*, double)
#2 ./libsol_cx_2.so: aspect::InclusionBenchmark::SolCxPostprocessor<2>::execute(dealii::TableHandler&)
#3 ../aspect: aspect::Postprocess::Manager<2>::execute(dealii::TableHandler&)
#4 ../aspect: aspect::Simulator<2>::postprocess()
#5 ../aspect: aspect::Simulator<2>::run()
#6 ../aspect: void run_simulator<2>(std::string const&, bool, bool)
#7 ../aspect: main
--------------------------------------------------------
I have an idea where it is coming from and will fix, lets see if that solves your issue as well.
The fix for my problem is in #2214, does your error still occur after that fix?
I can reproduce exactly your error message on ubuntu 14.04 (clang 6 manually installed) and 18.04 (clang 6 installed from repository). I can not see what is happening though. Is there a tester for deal.II with clang 6 and this particular setup? Then we could at least narrow down if the problem is in aspect or deal.II.
This is the full callstack:
[cb45769cb27a:06347] *** Process received signal ***
[cb45769cb27a:06347] Signal: Floating point exception (8)
[cb45769cb27a:06347] Signal code: Invalid floating point operation (7)
[cb45769cb27a:06347] Failing at address: 0x135f71c
[cb45769cb27a:06347] [ 0] /lib/x86_64-linux-gnu/libpthread.so.0(+0x12890)[0x7f0098b0a890]
[cb45769cb27a:06347] [ 1] ../aspect(_ZNK6aspect10Assemblers25StokesIncompressibleTermsILi2EE7executeERNS_8internal8Assembly7Scratch11ScratchBaseILi2EEERNS4_8CopyData12CopyDataBaseILi2EEE+0x3ec)[0x135f71c]
[cb45769cb27a:06347] [ 2] ../aspect(_ZN6aspect9SimulatorILi2EE28local_assemble_stokes_systemERKN6dealii18TriaActiveIteratorINS2_15DoFCellAccessorINS2_10DoFHandlerILi2ELi2EEELb0EEEEERNS_8internal8Assembly7Scratch12StokesSystemILi2EEERNSC_8CopyData12StokesSystemILi2EEE+0x396)[0x128c3c6]
[cb45769cb27a:06347] [ 3] ../aspect(_ZNSt5_BindIFMN6aspect9SimulatorILi2EEEFvRKN6dealii18TriaActiveIteratorINS3_15DoFCellAccessorINS3_10DoFHandlerILi2ELi2EEELb0EEEEERNS0_8internal8Assembly7Scratch12StokesSystemILi2EEERNSD_8CopyData12StokesSystemILi2EEEEPS2_St12_PlaceholderILi1EESP_ILi2EESP_ILi3EEEE6__callIvJRNS3_16FilteredIteratorIS9_EESH_SL_EJLm0ELm1ELm2ELm3EEEET_OSt5tupleIJDpT0_EESt12_Index_tupleIJXspT1_EEE+0x95)[0x129e425]
[cb45769cb27a:06347] [ 4] ../aspect(_ZNSt5_BindIFMN6aspect9SimulatorILi2EEEFvRKN6dealii18TriaActiveIteratorINS3_15DoFCellAccessorINS3_10DoFHandlerILi2ELi2EEELb0EEEEERNS0_8internal8Assembly7Scratch12StokesSystemILi2EEERNSD_8CopyData12StokesSystemILi2EEEEPS2_St12_PlaceholderILi1EESP_ILi2EESP_ILi3EEEEclIJRNS3_16FilteredIteratorIS9_EESH_SL_EvEET0_DpOT_+0x51)[0x129dc21]
[cb45769cb27a:06347] [ 5] ../aspect(_ZN6dealii10WorkStream3runISt5_BindIFMN6aspect9SimulatorILi2EEEFvRKNS_18TriaActiveIteratorINS_15DoFCellAccessorINS_10DoFHandlerILi2ELi2EEELb0EEEEERNS3_8internal8Assembly7Scratch12StokesSystemILi2EEERNSF_8CopyData12StokesSystemILi2EEEEPS5_St12_PlaceholderILi1EESR_ILi2EESR_ILi3EEEES2_IFMS5_FvRKSM_ESQ_SS_EENS_16FilteredIteratorISB_EESI_SM_EEvRKT1_RKNS_8identityIS15_E4typeET_T0_RKT2_RKT3_jj+0x107)[0x128cea7]
[cb45769cb27a:06347] [ 6] ../aspect(_ZN6aspect9SimulatorILi2EE22assemble_stokes_systemEv+0x43f)[0x128cbef]
[cb45769cb27a:06347] [ 7] ../aspect(_ZN6aspect9SimulatorILi2EE25assemble_and_solve_stokesEbPd+0xaa)[0x13a80ea]
[cb45769cb27a:06347] [ 8] ../aspect(_ZN6aspect9SimulatorILi2EE34solve_no_advection_iterated_stokesEv+0x7a)[0x13a842a]
[cb45769cb27a:06347] [ 9] ../aspect(_ZN6aspect9SimulatorILi2EE14solve_timestepEv+0x14e)[0x12d7d5e]
[cb45769cb27a:06347] [10] ../aspect(_ZN6aspect9SimulatorILi2EE3runEv+0x300)[0x12d70d0]
[cb45769cb27a:06347] [11] ../aspect(_Z13run_simulatorILi2EEvRKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEEbb+0xca)[0x10d364a]
[cb45769cb27a:06347] [12] ../aspect(main+0x33a)[0x10d2e3a]
[cb45769cb27a:06347] [13] /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xe7)[0x7f0098728b97]
[cb45769cb27a:06347] [14] ../aspect(_start+0x2a)[0x10627aa]
[cb45769cb27a:06347] *** End of error message ***
Does it tell us anything that the exception is raised from libpthread?
Demangled, this looks as follows:
[cb45769cb27a:06347] *** Process received signal ***
[cb45769cb27a:06347] Signal: Floating point exception (8)
[cb45769cb27a:06347] Signal code: Invalid floating point operation (7)
[cb45769cb27a:06347] Failing at address: 0x135f71c
[cb45769cb27a:06347] [ 0] /lib/x86_64-linux-gnu/libpthread.so.0(+0x12890)[0x7f0098b0a890]
[cb45769cb27a:06347] [ 1] ../aspect(aspect::Assemblers::StokesIncompressibleTerms<2>::execute(aspect::internal::Assembly::Scratch::ScratchBase<2>&, aspect::internal::Assembly::CopyData::CopyDataBase<2>&) const+0x3ec)[0x135f71c]
[cb45769cb27a:06347] [ 2] ../aspect(aspect::Simulator<2>::local_assemble_stokes_system(dealii::TriaActiveIterator<dealii::DoFCellAccessor<dealii::DoFHandler<2, 2>, false> > const&, aspect::internal::Assembly::Scratch::StokesSystem<2>&, aspect::internal::Assembly::CopyData::StokesSystem<2>&)+0x396)[0x128c3c6]
[cb45769cb27a:06347] [ 3] ../aspect(void std::_Bind<void (aspect::Simulator<2>::*(aspect::Simulator<2>*, std::_Placeholder<1>, std::_Placeholder<2>, std::_Placeholder<3>))(dealii::TriaActiveIterator<dealii::DoFCellAccessor<dealii::DoFHandler<2, 2>, false> > const&, aspect::internal::Assembly::Scratch::StokesSystem<2>&, aspect::internal::Assembly::CopyData::StokesSystem<2>&)>::__call<void, dealii::FilteredIterator<dealii::TriaActiveIterator<dealii::DoFCellAccessor<dealii::DoFHandler<2, 2>, false> > >&, aspect::internal::Assembly::Scratch::StokesSystem<2>&, aspect::internal::Assembly::CopyData::StokesSystem<2>&, 0ul, 1ul, 2ul, 3ul>(std::tuple<dealii::FilteredIterator<dealii::TriaActiveIterator<dealii::DoFCellAccessor<dealii::DoFHandler<2, 2>, false> > >&, aspect::internal::Assembly::Scratch::StokesSystem<2>&, aspect::internal::Assembly::CopyData::StokesSystem<2>&>&&, std::_Index_tuple<0ul, 1ul, 2ul, 3ul>)+0x95)[0x129e425]
[cb45769cb27a:06347] [ 4] ../aspect(void std::_Bind<void (aspect::Simulator<2>::*(aspect::Simulator<2>*, std::_Placeholder<1>, std::_Placeholder<2>, std::_Placeholder<3>))(dealii::TriaActiveIterator<dealii::DoFCellAccessor<dealii::DoFHandler<2, 2>, false> > const&, aspect::internal::Assembly::Scratch::StokesSystem<2>&, aspect::internal::Assembly::CopyData::StokesSystem<2>&)>::operator()<dealii::FilteredIterator<dealii::TriaActiveIterator<dealii::DoFCellAccessor<dealii::DoFHandler<2, 2>, false> > >&, aspect::internal::Assembly::Scratch::StokesSystem<2>&, aspect::internal::Assembly::CopyData::StokesSystem<2>&, void>(dealii::FilteredIterator<dealii::TriaActiveIterator<dealii::DoFCellAccessor<dealii::DoFHandler<2, 2>, false> > >&, aspect::internal::Assembly::Scratch::StokesSystem<2>&, aspect::internal::Assembly::CopyData::StokesSystem<2>&)+0x51)[0x129dc21]
[cb45769cb27a:06347] [ 5] ../aspect(void dealii::WorkStream::run<std::_Bind<void (aspect::Simulator<2>::*(aspect::Simulator<2>*, std::_Placeholder<1>, std::_Placeholder<2>, std::_Placeholder<3>))(dealii::TriaActiveIterator<dealii::DoFCellAccessor<dealii::DoFHandler<2, 2>, false> > const&, aspect::internal::Assembly::Scratch::StokesSystem<2>&, aspect::internal::Assembly::CopyData::StokesSystem<2>&)>, std::_Bind<void (aspect::Simulator<2>::*(aspect::Simulator<2>*, std::_Placeholder<1>))(aspect::internal::Assembly::CopyData::StokesSystem<2> const&)>, dealii::FilteredIterator<dealii::TriaActiveIterator<dealii::DoFCellAccessor<dealii::DoFHandler<2, 2>, false> > >, aspect::internal::Assembly::Scratch::StokesSystem<2>, aspect::internal::Assembly::CopyData::StokesSystem<2> >(dealii::FilteredIterator<dealii::TriaActiveIterator<dealii::DoFCellAccessor<dealii::DoFHandler<2, 2>, false> > > const&, dealii::identity<dealii::FilteredIterator<dealii::TriaActiveIterator<dealii::DoFCellAccessor<dealii::DoFHandler<2, 2>, false> > > >::type const&, std::_Bind<void (aspect::Simulator<2>::*(aspect::Simulator<2>*, std::_Placeholder<1>, std::_Placeholder<2>, std::_Placeholder<3>))(dealii::TriaActiveIterator<dealii::DoFCellAccessor<dealii::DoFHandler<2, 2>, false> > const&, aspect::internal::Assembly::Scratch::StokesSystem<2>&, aspect::internal::Assembly::CopyData::StokesSystem<2>&)>, std::_Bind<void (aspect::Simulator<2>::*(aspect::Simulator<2>*, std::_Placeholder<1>))(aspect::internal::Assembly::CopyData::StokesSystem<2> const&)>, aspect::internal::Assembly::Scratch::StokesSystem<2> const&, aspect::internal::Assembly::CopyData::StokesSystem<2> const&, unsigned int, unsigned int)+0x107)[0x128cea7]
[cb45769cb27a:06347] [ 6] ../aspect(aspect::Simulator<2>::assemble_stokes_system()+0x43f)[0x128cbef]
[cb45769cb27a:06347] [ 7] ../aspect(aspect::Simulator<2>::assemble_and_solve_stokes(bool, double*)+0xaa)[0x13a80ea]
[cb45769cb27a:06347] [ 8] ../aspect(aspect::Simulator<2>::solve_no_advection_iterated_stokes()+0x7a)[0x13a842a]
[cb45769cb27a:06347] [ 9] ../aspect(aspect::Simulator<2>::solve_timestep()+0x14e)[0x12d7d5e]
[cb45769cb27a:06347] [10] ../aspect(aspect::Simulator<2>::run()+0x300)[0x12d70d0]
[cb45769cb27a:06347] [11] ../aspect(void run_simulator<2>(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, bool, bool)+0xca)[0x10d364a]
[cb45769cb27a:06347] [12] ../aspect(main+0x33a)[0x10d2e3a]
I don't think that the problem is inside libpthread, but only that the pthread library has installed a signal handler that cleans up the thread before it aborts the program.
In other words, I assume that the problem happens inside the execute()
function. Can you narrow things down by putting printf
statements in there?
@bangerth: Timo posted the line that gdb shows, but that makes no sense, because there is no floating point operation there. So it must happen at some random point before. Would printf check for FPE's?
A few more informations:
So does our test just incorrectly assume that FPEs would work on this system? Then #2225 would be the solution. Should we just go with that and make the release? I do not see a reason why clang6 should suddenly find errors that other compilers did not find before.
I do not see a reason why clang6 should suddenly find errors that other compilers did not find before.
I assume clang is more aggressive in optimizing the code in debug mode. Without FP exceptions, it is of course legal to optimize something like
const double bdf2_factor = (use_bdf2_scheme)? ((2*time_step + old_time_step) /
(time_step + old_time_step)) : 1.0;
and always do the divide. I don't think we have a bug in our code.
Should we just go with that and make the release?
Hardcoding a check like this for a specific compiler version is not ideal. I would prefer to extend the check. Give me a plane ride to see if I can figure this out. ;-)
Let's let @tjhei have his plane ride :-)
@gassmoeller -- no, printf
doesn't fix the issue of course. I just meant this as a way to figure out in which line the problem happens -- put some printf
s throughout the function and see which ones get executed before the exception happens. printf
is an expensive and non-inlined function, so the compile will generally not move instructions across these calls. That means that if a particular printf
shows its output, the offending instruction must indeed be in the lines that follow.
So, my guess was correct: clang is optimizing around simple bool checks an eagerly evaluates expressions that contain floating point exceptions like the bdf2_factor
above. I can work around this by moving it into a separate function, for example.
Note that I am hitting similar problems in other functions...
I tried extending our FPE check to contain code similar to this, but I haven't succeeded in making it fail the check.
So what do we do? Try to disable these clang optimizations? rewrite the functions to be safe? blacklist all clang 6.0+ for FPEs?
That's clearly a compiler bug then. I vote to just disable FPEs for clang 6, as already implemented in #2225. This has the advantage that (i) we don't further obfuscate our source code, (ii) don't penalize everyone who is using a different compiler. The number of people who would be impacted by #2225 is likely quite small, and that's useful.
while not "fixed", let's close this with #2225 as the solution.
I am getting
when running solcx in the second nonlinear solve. I am not sure what is going on here.