STEllAR-GROUP / octotiger

Astrophysics program simulating the evolution of star systems based on the fast multipole method on adaptive Octrees
http://octotiger.stellar-group.org/
Boost Software License 1.0
48 stars 18 forks source link

Floating point exeption for DWD using SVE on Fugaku #430

Closed diehlpk closed 1 year ago

diehlpk commented 2 years ago

Expected Behavior

The DWD run should finish

Actual Behavior

However, we get a floating point exception using the more block branch.

Start execution the solver...
-----------------------------------------------
checking for refinement
regridding
Regridded tree in 0.954963 seconds
rebalancing 11849 nodes with 10368 leaves
Rebalanced tree in 0.979901 seconds
forming tree connections
7288 amr boundaries
Formed tree in 7.751028 seconds
solving gravity
regrid done in 23.843107 seconds
---------------------------------------
OMEGA = 9.681361e-01, output_dt = 4.000000e-02
0.000000e+00 4.000000e-02
dwd step...
[l01-0209c:00103] *** Process received signal ***
[l01-0209c:00103] Signal: Floating point exception (8)
[l01-0209c:00103] Signal code:  (14)
[l01-0209c:00103] Failing at address: 0x400000ddb3ac
[l01-0209c:00103] [ 0] linux-vdso.so.1(__kernel_rt_sigreturn+0x0)[0x4000000707a0]
[l01-0209c:00103] [ 1] /vol0003/mdt0/data/hp210311/u10393/OctoTigerBuildChain/build/octotiger/build/libhpx_hydrolib.so(_Z25cell_reconstruct_ppm_simdIN3sve12experimental14parallelism_v24simdIdNS2_8simd_abi7sve_abiEEENS2_9simd_maskIdS5_EEEvPdPKdbbSB_iiiii+0xb8c)[0x400000ddb3ac]
[l01-0209c:00103] [ 2] /vol0003/mdt0/data/hp210311/u10393/OctoTigerBuildChain/build/octotiger/build/libhpx_hydrolib.so(_Z35cell_reconstruct_inner_loop_p1_simdIN3sve12experimental14parallelism_v24simdIdNS2_8simd_abi7sve_abiEEENS2_9simd_maskIdS5_EEEvmiPKiSA_PdPKdSB_dSD_iiiii+0xa04)[0x400000ddc404]
[l01-0209c:00103] [ 3] /vol0003/mdt0/data/hp210311/u10393/OctoTigerBuildChain/build/octotiger/build/libhpx_hydrolib.so(_ZZNK6Kokkos4Impl11ParallelForIZ23reconstruct_no_amc_implIN3sve12experimental14parallelism_v24simdIdNS5_8simd_abi7sve_abiEEENS5_9simd_maskIdS8_EENS_12Experimental3HPXEN8recycler24aggregated_recycled_viewINS_4ViewIPdJNS_11LayoutRightENS_9HostSpaceENS_12MemoryTraitsILj1EEEEEE15Allocator_SliceIdSaIdEN3hpx6kokkos8executorISD_EEEdEENSF_INSG_IPiJSI_SJ_SL_EEESN_IiSaIiESS_EiEEEvRNSR_IT1_EERN19Aggregated_ExecutorIS11_E14Executor_SliceEdiiRKT3_S19_RT2_RKS1A_S1B_S1B_S1D_S1D_iiimmEUlRKNS0_13HPXTeamMemberEE_NS_10TeamPolicyIJSD_EEESD_E12execute_taskEvENKUliE_clEi+0x190)[0x400000dde0b0]
[l01-0209c:00103] [ 4] /vol0003/mdt0/data/hp210311/u10393/OctoTigerBuildChain/build/octotiger/build/libhpx_hydrolib.so(_ZN6Kokkos12parallel_forINS_10TeamPolicyIJNS_12Experimental3HPXEEEEZ23reconstruct_no_amc_implIN3sve12experimental14parallelism_v24simdIdNS8_8simd_abi7sve_abiEEENS8_9simd_maskIdSB_EES3_N8recycler24aggregated_recycled_viewINS_4ViewIPdJNS_11LayoutRightENS_9HostSpaceENS_12MemoryTraitsILj1EEEEEE15Allocator_SliceIdSaIdEN3hpx6kokkos8executorIS3_EEEdEENSG_INSH_IPiJSJ_SK_SM_EEESO_IiSaIiEST_EiEEEvRNSS_IT1_EERN19Aggregated_ExecutorIS12_E14Executor_SliceEdiiRKT3_S1A_RT2_RKS1B_S1C_S1C_S1E_S1E_iiimmEUlRKNS_4Impl13HPXTeamMemberEE_vEEvRKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEERKT_RKT0_+0x3f4)[0x400000dde634]
[l01-0209c:00103] [ 5] /vol0003/mdt0/data/hp210311/u10393/OctoTigerBuildChain/build/octotiger/build/libhpx_hydrolib.so(_ZZ27launch_hydro_kokkos_kernelsIN3hpx6kokkos8executorIN6Kokkos12Experimental3HPXEEEE10timestep_tRK14hydro_computerILi3ELi8E7physicsILi3EEERKSt6vectorISE_IdSaIdEESaISG_EESK_dmRT_RSE_I13hydro_state_tISG_ESaISO_EEENKUlOSL_E_clINS0_6futureIN19Aggregated_ExecutorIS6_E14Executor_SliceEEEEEDaSS_+0x27c4)[0x400000df51c4]
[l01-0209c:00103] [ 6] /vol0003/mdt0/data/hp210311/u10393/OctoTigerBuildChain/build/octotiger/build/libhpx_hydrolib.so(_ZN3hpx4lcos6detail19invoke_continuationINS_6detail18annotated_functionIZ27launch_hydro_kokkos_kernelsINS_6kokkos8executorIN6Kokkos12Experimental3HPXEEEE10timestep_tRK14hydro_computerILi3ELi8E7physicsILi3EEERKSt6vectorISJ_IdSaIdEESaISL_EESP_dmRT_RSJ_I13hydro_state_tISL_ESaIST_EEEUlOSQ_E_EENS_6futureIN19Aggregated_ExecutorISB_E14Executor_SliceEEENS1_12continuationIS14_SZ_SC_EEEENSt9enable_ifIXntsrNS_6traits6detail16is_unique_futureINS_4util13invoke_resultISQ_JT0_EE4typeEvEE5valueEvE4typeESR_OS1D_RT1_+0x278)[0x400000df5898]
[l01-0209c:00103] [ 7] /vol0003/mdt0/data/hp210311/u10393/OctoTigerBuildChain/build/octotiger/build/libhpx_hydrolib.so(_ZN3hpx4util6detail15callable_vtableIFSt4pairINS_7threads21thread_schedule_stateENS4_9thread_idEENS4_20thread_restart_stateEEE7_invokeINS4_6detail23thread_function_nullaryIZNS_4lcos6detail12continuationINS_6futureIN19Aggregated_ExecutorINS_6kokkos8executorIN6Kokkos12Experimental3HPXEEEE14Executor_SliceEEENS_6detail18annotated_functionIZ27launch_hydro_kokkos_kernelsISO_E10timestep_tRK14hydro_computerILi3ELi8E7physicsILi3EEERKSt6vectorIS12_IdSaIdEESaIS14_EES18_dmRT_RS12_I13hydro_state_tIS14_ESaIS1C_EEEUlOS19_E_EESV_E5asyncINSF_19post_policy_spawnerEEEvONS_13intrusive_ptrINSF_16future_data_baseISQ_EEEES1G_RNS_10error_codeEEUlvE_EEEES7_PvOS8_+0x6c)[0x400000dc332c]
[l01-0209c:00103] [ 8] /vol0003/mdt0/data/hp210311/u10393/OctoTigerBuildChain/build/hpx/lib64/libhpx_core.so(_ZN3hpx7threads10coroutines6detail14coroutine_implclEv+0xf4)[0x400001996714]
[l01-0209c:00103] [ 9] /vol0003/mdt0/data/hp210311/u10393/OctoTigerBuildChain/build/hpx/lib64/libhpx_core.so(+0x115e38)[0x400001995e38]
[l01-0209c:00103] [10] /vol0003/mdt0/data/hp210311/u10393/OctoTigerBuildChain/build/boost/lib/libboost_context.so.1.79.0(make_fcontext+0x18)[0x400001e70b9c]
[l01-0209c:00103] *** End of error message ***

Steps to Reproduce the Problem

... Please be as specific as possible while describing how to reproduce your problem.

  1. Compile Octo-Tiger more_block using STD - SVE

Specifications

... Please describe your environment

diehlpk commented 1 year ago

I can reproduce this on Perlmutter for dwd and v1309.

diehlpk commented 1 year ago

Ok, compiling without-simd and the bug is gone.

cc @srinivasyadav18

diehlpk commented 1 year ago

This bug does not appear with the latest version and can be closed.