Open diehlpk opened 2 months ago
@dmarce1 I tried to compile the new branch and I get the following error
2 errors found in build log:
123 -- Octo-Tiger will use Kokkos Serial Execution Space for (Kokkos CPU) Hydro kernels!
124 INFO Building with fp_contract=off
125 -- Octo-Tiger max nf: 15
126 -- Octo-Tiger minimal allowed theta: 0.34
127 INFO Used Octo-Tiger commit: 02cf56d9bc2b4852022886f5cff6a39bb7438a07
128 -- Configuring done
>> 129 CMake Error at /users/diehlpk/spack/opt/spack/linux-sles15-neoverse_v2/gcc-12.3.0/hpx-1.9.1-4e54quutjtm4nz
4y447r5kanti3odvn6/lib64/cmake/HPX/HPX_AddLibrary.cmake:235 (add_library):
130 Cannot find source file:
131
132 octotiger/verbose.hpp
133
134 Tried extensions .c .C .c++ .cc .cpp .cxx .cu .mpp .m .M .mm .h .hh .h++
135 .hm .hpp .hxx .in .txx .f .F .for .f77 .f90 .f95 .f03 .ispc
136 Call Stack (most recent call first):
137 CMakeLists.txt:349 (add_hpx_library)
138
139
>> 140 CMake Error at /users/diehlpk/spack/opt/spack/linux-sles15-neoverse_v2/gcc-12.3.0/hpx-1.9.1-4e54quutjtm4nz
4y447r5kanti3odvn6/lib64/cmake/HPX/HPX_AddLibrary.cmake:235 (add_library):
141 No SOURCES given to target: octolib
142 Call Stack (most recent call first):
143 CMakeLists.txt:349 (add_hpx_library)
144
145
146 CMake Generate step failed. Build files cannot be regenerated correctly.
cc @G-071 and @JiakunYan
The code hangs here
New Omega = 9.687093e-01
t=21 END : DWD step (file: /users/diehlpk/compile/octotiger/src/node_server_actions_3.cpp, line: 396, function: execute_solver) (2.939855e+00 s elapsed)
TS 16:: t: 2.423595e+02, dt: 1.655967e-03, time_elapsed: 3.066656e+00, rotational_time: 2.347759e+02, x: 1.492980e+00, y: -4.062294e+00, z: -3.587781e-01, a: 3.727446e+00, ur: 2.053696e-06, ul: 2.037199e-06, vr: 6.826120e-01, vl: 6.800910e-01, dim: 0, ngrids: 8393, leafs: 7344, amr_boundaries: 5960
t=21 BEGIN: regrid (file: /users/diehlpk/compile/octotiger/src/node_server_actions_3.cpp, line: 458, function: execute_solver)
-----------------------------------------------
t=21 BEGIN: check for refinement (file: /users/diehlpk/compile/octotiger/src/node_server_actions_1.cpp, line: 251, function: regrid)
t=21 END : check for refinement (file: /users/diehlpk/compile/octotiger/src/node_server_actions_1.cpp, line: 251, function: regrid) (3.230400e-02 s elapsed)
t=21 BEGIN: regrid (file: /users/diehlpk/compile/octotiger/src/node_server_actions_1.cpp, line: 259, function: regrid)
t=21 BEGIN: gather (file: /users/diehlpk/compile/octotiger/src/node_server_actions_1.cpp, line: 261, function: regrid)
(rebalancing 8489 nodes with 7428 leaves)
t=21 END : gather (file: /users/diehlpk/compile/octotiger/src/node_server_actions_1.cpp, line: 261, function: regrid) (4.129000e-03 s elapsed)
t=21 BEGIN: scatter (file: /users/diehlpk/compile/octotiger/src/node_server_actions_1.cpp, line: 266, function: regrid)
t=21 END : scatter (file: /users/diehlpk/compile/octotiger/src/node_server_actions_1.cpp, line: 266, function: regrid) (1.547000e-02 s elapsed)
t=21 BEGIN: form tree connections (file: /users/diehlpk/compile/octotiger/src/node_server_actions_1.cpp, line: 271, function: regrid)
(6072 amr boundaries)
t=21 END : form tree connections (file: /users/diehlpk/compile/octotiger/src/node_server_actions_1.cpp, line: 271, function: regrid) (1.114730e-01 s elapsed)
t=21 BEGIN: solve gravity (file: /users/diehlpk/compile/octotiger/src/node_server_actions_1.cpp, line: 276, function: regrid)
t=21 BEGIN: (root node) computing FMM (file: /users/diehlpk/compile/octotiger/src/node_server_actions_1.cpp, line: 329, function: solve_gravity)
t=22 END : (root node) computing FMM (file: /users/diehlpk/compile/octotiger/src/node_server_actions_1.cpp, line: 329, function: solve_gravity) (3.010200e-02 s elapsed)
t=22 END : solve gravity (file: /users/diehlpk/compile/octotiger/src/node_server_actions_1.cpp, line: 276, function: regrid) (7.064400e-02 s elapsed)
t=22 END : regrid (file: /users/diehlpk/compile/octotiger/src/node_server_actions_1.cpp, line: 259, function: regrid) (2.020480e-01 s elapsed)
t=22 END : regrid (file: /users/diehlpk/compile/octotiger/src/node_server_actions_3.cpp, line: 458, function: execute_solver) (2.345370e-01 s elapsed)
t=22 END : main execution loop iteration (file: /users/diehlpk/compile/octotiger/src/node_server_actions_3.cpp, line: 363, function: execute_solver) (3.301262e+00 s elapsed)
t=22 BEGIN: main execution loop iteration (file: /users/diehlpk/compile/octotiger/src/node_server_actions_3.cpp, line: 363, function: execute_solver)
t=22 BEGIN: DWD step (file: /users/diehlpk/compile/octotiger/src/node_server_actions_3.cpp, line: 396, function: execute_solver)
Patrick-
This narrows it down a bit, looks like it is between entry into the main loop and when the distributed part of the solver kicks in. I may need to add some more debugging language to figure out exactly where it though. If so I'll push something today.
Thanks Dominic
On Thu, Sep 19, 2024, 11:20 Patrick Diehl @.***> wrote:
The code hangs here
New Omega = 9.687093e-01 t=21 END : DWD step (file: /users/diehlpk/compile/octotiger/src/node_server_actions_3.cpp, line: 396, function: execute_solver) (2.939855e+00 s elapsed) TS 16:: t: 2.423595e+02, dt: 1.655967e-03, time_elapsed: 3.066656e+00, rotational_time: 2.347759e+02, x: 1.492980e+00, y: -4.062294e+00, z: -3.587781e-01, a: 3.727446e+00, ur: 2.053696e-06, ul: 2.037199e-06, vr: 6.826120e-01, vl: 6.800910e-01, dim: 0, ngrids: 8393, leafs: 7344, amr_boundaries: 5960 t=21 BEGIN: regrid (file: /users/diehlpk/compile/octotiger/src/node_server_actions_3.cpp, line: 458, function: execute_solver)
t=21 BEGIN: check for refinement (file: /users/diehlpk/compile/octotiger/src/node_server_actions_1.cpp, line: 251, function: regrid) t=21 END : check for refinement (file: /users/diehlpk/compile/octotiger/src/node_server_actions_1.cpp, line: 251, function: regrid) (3.230400e-02 s elapsed) t=21 BEGIN: regrid (file: /users/diehlpk/compile/octotiger/src/node_server_actions_1.cpp, line: 259, function: regrid) t=21 BEGIN: gather (file: /users/diehlpk/compile/octotiger/src/node_server_actions_1.cpp, line: 261, function: regrid) (rebalancing 8489 nodes with 7428 leaves) t=21 END : gather (file: /users/diehlpk/compile/octotiger/src/node_server_actions_1.cpp, line: 261, function: regrid) (4.129000e-03 s elapsed) t=21 BEGIN: scatter (file: /users/diehlpk/compile/octotiger/src/node_server_actions_1.cpp, line: 266, function: regrid) t=21 END : scatter (file: /users/diehlpk/compile/octotiger/src/node_server_actions_1.cpp, line: 266, function: regrid) (1.547000e-02 s elapsed) t=21 BEGIN: form tree connections (file: /users/diehlpk/compile/octotiger/src/node_server_actions_1.cpp, line: 271, function: regrid) (6072 amr boundaries) t=21 END : form tree connections (file: /users/diehlpk/compile/octotiger/src/node_server_actions_1.cpp, line: 271, function: regrid) (1.114730e-01 s elapsed) t=21 BEGIN: solve gravity (file: /users/diehlpk/compile/octotiger/src/node_server_actions_1.cpp, line: 276, function: regrid) t=21 BEGIN: (root node) computing FMM (file: /users/diehlpk/compile/octotiger/src/node_server_actions_1.cpp, line: 329, function: solve_gravity) t=22 END : (root node) computing FMM (file: /users/diehlpk/compile/octotiger/src/node_server_actions_1.cpp, line: 329, function: solve_gravity) (3.010200e-02 s elapsed) t=22 END : solve gravity (file: /users/diehlpk/compile/octotiger/src/node_server_actions_1.cpp, line: 276, function: regrid) (7.064400e-02 s elapsed) t=22 END : regrid (file: /users/diehlpk/compile/octotiger/src/node_server_actions_1.cpp, line: 259, function: regrid) (2.020480e-01 s elapsed) t=22 END : regrid (file: /users/diehlpk/compile/octotiger/src/node_server_actions_3.cpp, line: 458, function: execute_solver) (2.345370e-01 s elapsed) t=22 END : main execution loop iteration (file: /users/diehlpk/compile/octotiger/src/node_server_actions_3.cpp, line: 363, function: execute_solver) (3.301262e+00 s elapsed) t=22 BEGIN: main execution loop iteration (file: /users/diehlpk/compile/octotiger/src/node_server_actions_3.cpp, line: 363, function: execute_solver) t=22 BEGIN: DWD step (file: /users/diehlpk/compile/octotiger/src/node_server_actions_3.cpp, line: 396, function: execute_solver)
— Reply to this email directly, view it on GitHub https://github.com/STEllAR-GROUP/octotiger/issues/496#issuecomment-2361462532, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAO4RTXZOHKA3CNBHWCG23TZXL2VTAVCNFSM6AAAAABNSWBKFSVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDGNRRGQ3DENJTGI . You are receiving this because you were mentioned.Message ID: @.***>
Here is the new output
diagnostics...
New Omega = 9.687093e-01
t=25 END : DWD step (file: /users/diehlpk/compile/octotiger/src/node_server_actions_3.cpp, line: 396, function: execute_solver) (2.890631e+00 s elapsed)
TS 16:: t: 2.423595e+02, dt: 1.655967e-03, time_elapsed: 3.017211e+00, rotational_time: 2.347759e+02, x: 1.492980e+00, y: -4.062294e+00, z: -3.587781e-01, a: 3.727446e+00, ur: 2.053696e-06, ul: 2.037199e-06, vr: 6.826120e-01, vl: 6.800910e-01, dim: 0, ngrids: 8393, leafs: 7344, amr_boundaries: 5960
t=25 BEGIN: regrid (file: /users/diehlpk/compile/octotiger/src/node_server_actions_3.cpp, line: 458, function: execute_solver)
-----------------------------------------------
t=25 BEGIN: check for refinement (file: /users/diehlpk/compile/octotiger/src/node_server_actions_1.cpp, line: 251, function: regrid)
t=25 END : check for refinement (file: /users/diehlpk/compile/octotiger/src/node_server_actions_1.cpp, line: 251, function: regrid) (3.727400e-02 s elapsed)
t=25 BEGIN: regrid (file: /users/diehlpk/compile/octotiger/src/node_server_actions_1.cpp, line: 259, function: regrid)
t=25 BEGIN: gather (file: /users/diehlpk/compile/octotiger/src/node_server_actions_1.cpp, line: 261, function: regrid)
(rebalancing 8489 nodes with 7428 leaves)
t=25 END : gather (file: /users/diehlpk/compile/octotiger/src/node_server_actions_1.cpp, line: 261, function: regrid) (1.043100e-02 s elapsed)
t=25 BEGIN: scatter (file: /users/diehlpk/compile/octotiger/src/node_server_actions_1.cpp, line: 266, function: regrid)
t=25 END : scatter (file: /users/diehlpk/compile/octotiger/src/node_server_actions_1.cpp, line: 266, function: regrid) (1.588200e-02 s elapsed)
t=25 BEGIN: form tree connections (file: /users/diehlpk/compile/octotiger/src/node_server_actions_1.cpp, line: 271, function: regrid)
(6072 amr boundaries)
t=26 END : form tree connections (file: /users/diehlpk/compile/octotiger/src/node_server_actions_1.cpp, line: 271, function: regrid) (1.025130e-01 s elapsed)
t=26 BEGIN: solve gravity (file: /users/diehlpk/compile/octotiger/src/node_server_actions_1.cpp, line: 276, function: regrid)
t=26 BEGIN: (root node) computing FMM (file: /users/diehlpk/compile/octotiger/src/node_server_actions_1.cpp, line: 329, function: solve_gravity)
t=26 END : (root node) computing FMM (file: /users/diehlpk/compile/octotiger/src/node_server_actions_1.cpp, line: 329, function: solve_gravity) (4.937500e-02 s elapsed)
t=26 END : solve gravity (file: /users/diehlpk/compile/octotiger/src/node_server_actions_1.cpp, line: 276, function: regrid) (7.951700e-02 s elapsed)
t=26 END : regrid (file: /users/diehlpk/compile/octotiger/src/node_server_actions_1.cpp, line: 259, function: regrid) (2.084010e-01 s elapsed)
t=26 END : regrid (file: /users/diehlpk/compile/octotiger/src/node_server_actions_3.cpp, line: 458, function: execute_solver) (2.457240e-01 s elapsed)
t=26 END : main execution loop iteration (file: /users/diehlpk/compile/octotiger/src/node_server_actions_3.cpp, line: 363, function: execute_solver) (3.262951e+00 s elapsed)
t=26 BEGIN: main execution loop iteration (file: /users/diehlpk/compile/octotiger/src/node_server_actions_3.cpp, line: 363, function: execute_solver)
This is running without SILO output enabled? I think the problem may be SILO related. If it is being run with SILO output can you please re-run it with disable_output=on?
I think I have the bug narrowed down to diagnostics(), it is likely in this section of code in node_server_actions_2.cpp. I have added some more debug output which will hopefully let us narrow it down further.
EDIT: My bet is this is in all_hydro_bounds. If so, it may be hard to narrow it down using verbose debugging output past which kind of boundary exchange (there are three kinds, a) the restrict step which updates refined cells from their children, b) the decomp step which exchanges ghost cells between grids on the same level, and c) the AMR step which interpolates ghost cells at AMR boundaries).
diagnostics_t node_server::diagnostics(const diagnostics_t &diags) { if (is_refined) { auto rc = hpx::async(hpx::annotated_function([&]() { return child_diagnostics(diags); }, "diagnostics::return_child_diagnostics")); all_hydro_bounds(); auto diags = GET(rc); return diags; } else { all_hydro_bounds(); return local_diagnostics(diags); } }
I just pushed a branch called verbose_debug. To enable the debugging output, set --verbose=1, to disable, --verbose=0. I've attach an example of the output. It gives comments at the beginning and end of functions, along with the start time and the time elapsed during the execution of the function. When the comment has something like "(from root)" this means the code is within a function that executes for each node, and only the root node is emitting output.