Closed jrood-nrel closed 9 months ago
Hi @jrood-nrel,
Could you share some more details about the seg-faults you were seeing? I'm unable to trigger any misbehavior with the intel-2021.1.2 compiler here at Sandia, with either the unit or regression tests. What compiler did you use, what machine was it on, and which test(s) were seg-faulting? Thanks!
This was on Eagle. It's a case Ganesh is running.
MPT: #1 0x00002b53f0858c96 in mpi_sgi_system (
MPT: #2 MPI_SGI_stacktraceback (
MPT: header=header@entry=0x7ffefbf08990 "MPT ERROR: Rank 242(g:242) received signal SIGSEGV(11).\n\tProcess ID: 72481, Host: r3i2n11, Program: /lustre/filesystem/scratch/user/spack-manager/spack/opt/spack/linux-rhel7-skylake_avx512/intel-20.0.2/"...) at sig.c:340
MPT: #3 0x00002b53f0858e8f in first_arriver_handler (signo=signo@entry=11,
MPT: stack_trace_sem=stack_trace_sem@entry=0x2b53fedc0080) at sig.c:489
MPT: #4 0x00002b53f0859123 in slave_sig_handler (signo=11,
MPT: siginfo=<optimized out>, extra=<optimized out>) at sig.c:565
MPT: #5 <signal handler called>
MPT: #6 0x00002b53e38c2314 in sierra::nalu::max_extent(stk::mesh::FieldBase const&, unsigned int) ()
MPT: from /lustre/filesystem/scratch/user/spack-manager/spack/opt/spack/linux-rhel7-skylake_avx512/intel-20.0.2/nalu-wind-master-33lj2rbocawi6fvq2lqcqugtdhpzwbrf/lib/libnalu.so
MPT: #7 0x00002b53e37d439e in sierra::nalu::NodalGradAlgDriver<stk::mesh::Field<double, void, void, void, void, void, void, void> >::post_work() ()
MPT: from /lustre/filesystem/scratch/user/spack-manager/spack/opt/spack/linux-rhel7-skylake_avx512/intel-20.0.2/nalu-wind-master-33lj2rbocawi6fvq2lqcqugtdhpzwbrf/lib/libnalu.so
MPT: #8 0x00002b53e2670ba7 in sierra::nalu::WallDistEquationSystem::solve_and_update() ()
MPT: from /lustre/filesystem/scratch/user/spack-manager/spack/opt/spack/linux-rhel7-skylake_avx512/intel-20.0.2/nalu-wind-master-33lj2rbocawi6fvq2lqcqugtdhpzwbrf/lib/libnalu.so
MPT: #9 0x00002b53e223c980 in sierra::nalu::EquationSystems::initial_work() ()
MPT: from /lustre/filesystem/scratch/user/spack-manager/spack/opt/spack/linux-rhel7-skylake_avx512/intel-20.0.2/nalu-wind-master-33lj2rbocawi6fvq2lqcqugtdhpzwbrf/lib/libnalu.so
MPT: #10 0x00002b53e25ef833 in sierra::nalu::TimeIntegrator::prepare_for_time_integration() ()
MPT: from /lustre/filesystem/scratch/user/spack-manager/spack/opt/spack/linux-rhel7-skylake_avx512/intel-20.0.2/nalu-wind-master-33lj2rbocawi6fvq2lqcqugtdhpzwbrf/lib/libnalu.so
MPT: #11 0x00002b53e25efce5 in sierra::nalu::TimeIntegrator::integrate_realm() ()
MPT: from /lustre/filesystem/scratch/user/spack-manager/spack/opt/spack/linux-rhel7-skylake_avx512/intel-20.0.2/nalu-wind-master-33lj2rbocawi6fvq2lqcqugtdhpzwbrf/lib/libnalu.so
MPT: #12 0x0000000000415121 in main ()
Simulations:
- name: sim1
optimizer: opt1
time_integrator: ti_1
Time_Integrators:
- StandardTimeIntegrator:
name: ti_1
realms:
- realm_1
second_order_accuracy: true
start_time: 0
termination_step_count: 100
time_step: 0.0002667
time_step_count: 0
time_stepping_type: fixed
realms:
- boundary_conditions:
- target_name: wing
wall_boundary_condition: bc_wing
wall_user_data:
turbulent_ke: 0.0
use_wall_function: false
velocity:
- 0
- 0
- 0
- target_name: wing-pp
wall_boundary_condition: bc_wing_pp
wall_user_data:
turbulent_ke: 0.0
use_wall_function: false
velocity:
- 0
- 0
- 0
- inflow_boundary_condition: bc_inflow
inflow_user_data:
specific_dissipation_rate: 919.3455
turbulent_ke: 0.0010422
velocity:
- 75.0
- 0.0
- 0.0
target_name: inlet
- open_boundary_condition: bc_open
open_user_data:
pressure: 0.0
specific_dissipation_rate: 919.3455
turbulent_ke: 0.0010422
velocity:
- 0.0
- 0.0
- 0.0
target_name: outlet
- periodic_boundary_condition: bc_front_back
periodic_user_data:
search_tolerance: 0.0001
target_name:
- front
- back
check_for_missing_bcs: true
equation_systems:
max_iterations: 4
name: theEqSys
solver_system_specification:
velocity: solve_mom
turbulent_ke: solve_scalar
specific_dissipation_rate: solve_scalar
pressure: solve_elliptic
ndtw: solve_elliptic
systems:
- WallDistance:
convergence_tolerance: 1.0e-08
max_iterations: 1
name: myNDTW
- LowMachEOM:
convergence_tolerance: 1.0e-08
max_iterations: 1
name: myLowMach
- ShearStressTransport:
convergence_tolerance: 1.0e-08
max_iterations: 1
name: mySST
initial_conditions:
- constant: ic_1
target_name: fluid-hex
value:
pressure: 0
specific_dissipation_rate: 919.3455
turbulent_ke: 0.0010422
velocity:
- 75.0
- 0.0
- 0.0
material_properties:
specifications:
- name: density
type: constant
value: 1.2
- name: viscosity
type: constant
value: 9.0e-06
target_name: fluid-hex
mesh: mesh/ffa_w3_500_525_288_121_32.exo
#automatic_decomposition_type: rcb
#rebalance_mesh: yes
#stk_rebalance_method: parmetis
#use_edges: yes
#check_jacobians: true
name: realm_1
output:
output_data_base_name: results/ffa_w3_500_32_sst.e
output_frequency: 100
output_node_set: false
output_variables:
- velocity
- density
- pressure
- pressure_force
- viscous_force
- tau_wall_vector
- tau_wall
- turbulent_ke
- specific_dissipation_rate
- minimum_distance_to_wall
- sst_f_one_blending
- turbulent_viscosity
- element_courant
- q_criterion
- vorticity
- assembled_area_force_moment
restart:
restart_data_base_name: restart/sst_ffa_w3_500_32.rst
restart_frequency: 500
solution_options:
name: myOptions
options:
- hybrid_factor:
specific_dissipation_rate: 1.0
turbulent_ke: 1.0
velocity: 1.0
- alpha_upw:
specific_dissipation_rate: 1.0
turbulent_ke: 1.0
velocity: 1.0
- upw_factor:
specific_dissipation_rate: 0.0
turbulent_ke: 0.0
velocity: 1.0
- limiter:
pressure: true
velocity: true
- noc_correction:
pressure: true
- projected_nodal_gradient:
ndtw: element
pressure: element
specific_dissipation_rate: element
turbulent_ke: element
velocity: element
- relaxation_factor:
pressure: 0.3
specific_dissipation_rate: 0.7
turbulent_ke: 0.7
velocity: 0.7
- turbulence_model_constants:
SDRWallFactor: 0.625
projected_timescale_type: momentum_diag_inv
turbulence_model: sst
use_edges: true
linear_solvers:
- dump_hypre_matrix_stats: false
hypre_cfg_file: hypre_file.yaml
hypre_cfg_node: hypre_simple_precon
kspace: 100
max_iterations: 100
method: hypre_gmres
name: solve_mom
output_level: 0
preconditioner: boomerAMG
recompute_preconditioner_frequency: 1
reuse_linear_system: true
segregated_solver: true
simple_hypre_matrix_assemble: true
tolerance: 1e-5
type: hypre
write_matrix_files: false
- dump_hypre_matrix_stats: false
hypre_cfg_file: hypre_file.yaml
hypre_cfg_node: hypre_simple_precon
kspace: 100
max_iterations: 100
method: hypre_gmres
name: solve_scalar
preconditioner: boomerAMG
recompute_preconditioner_frequency: 1
reuse_linear_system: true
simple_hypre_matrix_assemble: true
tolerance: 1e-5
type: hypre
write_matrix_files: false
- dump_hypre_matrix_stats: false
hypre_cfg_file: hypre_file.yaml
hypre_cfg_node: hypre_elliptic
kspace: 40
max_iterations: 100
method: hypre_gmres
name: solve_elliptic
preconditioner: boomerAMG
recompute_preconditioner_frequency: 1
reuse_linear_system: true
simple_hypre_matrix_assemble: true
tolerance: 1e-5
type: hypre
write_matrix_files: false
@jrood-nrel Thanks for the information! The stack trace was juuuust enough to figure it out. I messed up a Field name when configuring a ScalarNodalGradAlgDriver
instance inside the WallDistEquationSystem
, so it ended up referencing a null Field when retrieving some sizing information for updating either a periodic BC field or an overset mesh Field after the nodal grad calculation was complete. This mistake should be independent of compiler and we have 17 regression tests that exercise this equation system. I find it a bit troubling that my local GCC build ran just fine with precisely zero diffs.
Either way, I'll wait to reintroduce this simple_fields update until after @psakievich is finished with updating nalu-wind with his smart fields and field manager changes. This patch should be significantly smaller once that work is done.
Reverts Exawind/nalu-wind#1233
This PR was causing Nalu-Wind to segfault when using the Intel compiler.