NOAA-GFDL / FMS

GFDL's Flexible Modeling System
Other
87 stars 128 forks source link

test_data_override sporadic failures in CI testing #1480

Open rem1776 opened 4 months ago

rem1776 commented 4 months ago

Describe the bug occasionally the first data_override test fails in the CI but passes on subsequent runs:

expecting success of test_data_override2_mono.1 'test_data_override with monotonically increasing and decreasing data sets (r4)': 
    mpirun -n 6 ../test_data_override_ongrid_${KIND}

NOTE from PE     0: MPP_DOMAINS_SET_STACK_SIZE: stack size set to    32768.
NOTE from PE     0: MPP_DOMAINS_SET_STACK_SIZE: stack size set to 17280000.
 test_data_override_emc domain decomposition
whalo =    2, ehalo =    2, shalo =    2, nhalo =    2
  X-AXIS =  180 180

FATAL from PE     4: NetCDF: Unknown file format: netcdf_file_open:INPUT/grid_spec.nc

#0  0x7795ff7006dd in ???
#1  0x7795ffb4d34f in ???
#2  0x7795ffbace4f in ???
#3  0x7795ffbab916 in ???
#4  0x7795ffb5197f in ???
#5  0x7795ffc4cec9 in ???
#6  0x7795ffc4d212 in ???
#7  0x7795ffb35c24 in ???
#8  0x7795ffb33713 in ???
#9  0x7795ffdbd9c6 in ???
#10  0x7795ffdc33ad in ???
#11  0x4034be in ???
#12  0x4094fa in ???
#13  0x7795fdfd5eaf in ???
#14  0x7795fdfd5f5f in ???
#15  0x402324 in ???
#16  0xffffffffffffffff in ???
Abort(1) on node 4 (rank 4 in comm 0): application called MPI_Abort(MPI_COMM_WORLD, 1) - process 4
error: last command exited with $?=1
  Y-AXIS =   60  60  60
not ok 1 - test_data_override with monotonically increasing and decreasing data sets (r4)
FAIL: test_data_override2_mono.sh 1 - test_data_override with monotonically increasing and decreasing data sets (r4)
#   
#       mpirun -n 6 ../test_data_override_ongrid_${KIND}
#       
FAIL: test_data_override2_mono.sh 1 - test_data_override with monotonically increasing and decreasing data sets (r4)

To Reproduce TBD, only seen this pop up in CI testing.

Expected behavior not fail

System Environment Describe the system environment, include: CI image (gcc 12+ mpich)