Open JiakunYan opened 3 months ago
Octo-Tiger completes.
Octo-Tiger (or HPX) occasionally complains some performance counters are not found.
Run Octo-Tiger with the following counters enabled:
--hpx:print-counter=/octotiger*/compute/gpu*kokkos* --hpx:print-counter=/arithmetics/add@/octotiger*/compute/gpu/hydro_kokkos --hpx:print-counter=/arithmetics/add@/octotiger*/compute/gpu/hydro_kokkos_aggregated
Since there counters are created by Octo-Tiger, I think it is an Octo-Tiger problem rather than an HPX problem.
I suspected there were some data races between the counter registration and usage.
{stack-trace}: 11 frames: 0x7ffbe76bc29a : /u/jiakuny/workspace/spack/opt/spack/linux-rhel8-zen3/gcc-11.2.1/hpx-master-ivytitn2twobtla6duwsdubshisph5z4/lib64/libhpx.so.1(+0x4b629a) [0x7ffbe76bc29a] in /u/jiakuny/workspace/spack/opt/spack/linux-rhel8-zen3/gcc-11.2.1/hpx-master-ivytitn2twobtla6duwsdubshisph5z4/lib64/libhpx.so.1 0x7ffbe6e15d65 : std::__exception_ptr::exception_ptr hpx::detail::get_exception<hpx::exception>(hpx::exception const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, long, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) [0x95] in /u/jiakuny/workspace/spack/opt/spack/linux-rhel8-zen3/gcc-11.2.1/hpx-master-ivytitn2twobtla6duwsdubshisph5z4/lib64/libhpx_core.so 0x7ffbe6e15e35 : void hpx::detail::throw_exception<hpx::exception>(hpx::exception const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, long) [0x55] in /u/jiakuny/workspace/spack/opt/spack/linux-rhel8-zen3/gcc-11.2.1/hpx-master-ivytitn2twobtla6duwsdubshisph5z4/lib64/libhpx_core.so 0x7ffbe6e0b854 : hpx::detail::throw_exception(hpx::error, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, long) [0x84] in /u/jiakuny/workspace/spack/opt/spack/linux-rhel8-zen3/gcc-11.2.1/hpx-master-ivytitn2twobtla6duwsdubshisph5z4/lib64/libhpx_core.so 0x7ffbe77be0f8 : hpx::performance_counters::detail::create_counter_local(hpx::performance_counters::counter_info const&) [0x3f8] in /u/jiakuny/workspace/spack/opt/spack/linux-rhel8-zen3/gcc-11.2.1/hpx-master-ivytitn2twobtla6duwsdubshisph5z4/lib64/libhpx.so.1 0x7ffbe77f80fd : hpx::components::server::runtime_support::create_performance_counter(hpx::performance_counters::counter_info const&) [0xd] in /u/jiakuny/workspace/spack/opt/spack/linux-rhel8-zen3/gcc-11.2.1/hpx-master-ivytitn2twobtla6duwsdubshisph5z4/lib64/libhpx.so.1 0x7ffbe785c8fc : /u/jiakuny/workspace/spack/opt/spack/linux-rhel8-zen3/gcc-11.2.1/hpx-master-ivytitn2twobtla6duwsdubshisph5z4/lib64/libhpx.so.1(+0x6568fc) [0x7ffbe785c8fc] in /u/jiakuny/workspace/spack/opt/spack/linux-rhel8-zen3/gcc-11.2.1/hpx-master-ivytitn2twobtla6duwsdubshisph5z4/lib64/libhpx.so.1 0x7ffbe78113bd : /u/jiakuny/workspace/spack/opt/spack/linux-rhel8-zen3/gcc-11.2.1/hpx-master-ivytitn2twobtla6duwsdubshisph5z4/lib64/libhpx.so.1(+0x60b3bd) [0x7ffbe78113bd] in /u/jiakuny/workspace/spack/opt/spack/linux-rhel8-zen3/gcc-11.2.1/hpx-master-ivytitn2twobtla6duwsdubshisph5z4/lib64/libhpx.so.1 0x7ffbe6e03866 : hpx::threads::coroutines::detail::coroutine_impl::operator()() [0xd6] in /u/jiakuny/workspace/spack/opt/spack/linux-rhel8-zen3/gcc-11.2.1/hpx-master-ivytitn2twobtla6duwsdubshisph5z4/lib64/libhpx_core.so 0x7ffbe6e02a29 : /u/jiakuny/workspace/spack/opt/spack/linux-rhel8-zen3/gcc-11.2.1/hpx-master-ivytitn2twobtla6duwsdubshisph5z4/lib64/libhpx_core.so(+0x113a29) [0x7ffbe6e02a29] in /u/jiakuny/workspace/spack/opt/spack/linux-rhel8-zen3/gcc-11.2.1/hpx-master-ivytitn2twobtla6duwsdubshisph5z4/lib64/libhpx_core.so {locality-id}: 2 {hostname}: [ (mpi:2) ] {process-id}: 68905 {os-thread}: 2, locality#0/worker-thread#16 {thread-id}: 0000000008897540 {thread-description}: <unknown> {state}: state::startup {auxinfo}: {file}: /u/jiakuny/workspace/hpx-lcw/libs/full/performance_counters/src/counters.cpp {line}: 808 {function}: create_counter_local {what}: no create function for performance counter found: /octotiger{locality#2/total}/compute/gpu/multipole_kokkos (counter type /octotiger/compute/gpu/multipole_kokkos is not defined, known counter types: /agas/count/allocate /agas/count/begin_migration /agas/count/bind /agas/count/bind_gid /agas/count/cache/entries /agas/count/cache/erase_entry /agas/count/cache/evictions /agas/count/cache/get_entry /agas/count/cache/hits /agas/count/cache/insert_entry /agas/count/cache/insertions /agas/count/cache/misses /agas/count/cache/update_entry /agas/count/decrement_credit /agas/count/end_migration /agas/count/increment_credit /agas/count/iterate_names /agas/count/on_symbol_namespace_event /agas/count/resolve /agas/count/resolve_gid /agas/count/route /agas/count/unbind /agas/count/unbind_gid /agas/primary/count /agas/primary/time /agas/symbol/count /agas/symbol/time /agas/time/allocate /agas/time/begin_migration /agas/time/bind /agas/time/bind_gid /agas/time/cache/erase_entry /agas/time/cache/get_entry /agas/time/cache/insert_entry /agas/time/cache/update_entry /agas/time/decrement_credit /agas/time/end_migration /agas/time/increment_credit /agas/time/iterate_names /agas/time/on_symbol_namespace_event /agas/time/resolve /agas/time/resolve_gid /agas/time/route /agas/time/unbind /agas/time/unbind_gid /arithmetics/add /arithmetics/count /arithmetics/divide /arithmetics/max /arithmetics/mean /arithmetics/median /arithmetics/min /arithmetics/multiply /arithmetics/subtract /arithmetics/variance /octotiger/amr_bounds /octotiger/compute/cpu/hydro_kokkos /octotiger/compute/cpu/hydro_kokkos_aggregated /octotiger/compute/cpu/hydro_kokkos_aggregation_rate /octotiger/compute/cpu/hydro_legacy /octotiger/compute/cpu/p2p_kokkos /octotiger/compute/gpu/hydro_cuda /octotiger/compute/gpu/hydro_cuda_aggregated /octotiger/compute/gpu/hydro_cuda_aggregation_rate /octotiger/compute/gpu/hydro_kokkos /octotiger/compute/gpu/hydro_kokkos_aggregated /octotiger/compute/gpu/hydro_kokkos_aggregation_rate /octotiger/compute/gpu/p2p_cuda /octotiger/compute/gpu/p2p_kokkos /octotiger/subgrid_leaves /octotiger/subgrids /parcelport/count/mpi/cache-evictions /parcelport/count/mpi/cache-hits /parcelport/count/mpi/cache-insertions /parcelport/count/mpi/cache-misses /parcelport/count/mpi/cache-reclaims /parcelqueue/length/receive /parcelqueue/length/send /parcels/count/routed /runtime/count/action-invocation /runtime/count/component /runtime/count/remote-action-invocation /runtime/uptime /scheduler/utilization/instantaneous /statistics/average /statistics/max /statistics/median /statistics/min /statistics/rolling_average /statistics/rolling_max /statistics/rolling_min /statistics/rolling_stddev /statistics/stddev /threadqueue/length /threads/busy-loop-count/instantaneous /threads/count/cumulative /threads/count/cumulative-phases /threads/count/instantaneous/active /threads/count/instantaneous/all /threads/count/instantaneous/pending /threads/count/instantaneous/staged /threads/count/instantaneous/suspended /threads/count/instantaneous/terminated /threads/idle-loop-count/instantaneous /threads/time/overall : HPX(bad_parameter)): HPX(bad_parameter):
This is on NCSA Delta. I think @diehlpk also encountered this problem on Ookami.
@G-071 What should we do about that?
Expected Behavior
Octo-Tiger completes.
Actual Behavior
Octo-Tiger (or HPX) occasionally complains some performance counters are not found.
Steps to Reproduce the Problem
Run Octo-Tiger with the following counters enabled:
Specifications
Since there counters are created by Octo-Tiger, I think it is an Octo-Tiger problem rather than an HPX problem.
I suspected there were some data races between the counter registration and usage.
This is on NCSA Delta. I think @diehlpk also encountered this problem on Ookami.