Closed mark-petersen closed 6 years ago
@mgduda can I merge this in? It corrects an error made last week in #1459, and is holding other things up for us.
@mark-petersen I'm testing now in a branch of atmosphere/develop
that uses grouped halo exchanges, and I'll report back before lunch.
To understand why exchangeGroup % sendList
was not associated, I reran my failing test, the ocean model in init mode, using intel debug, and writing out:
diff --git a/src/framework/mpas_stream_manager.F b/src/framework/mpas_stream_manager.F
-#define STREAM_DEBUG_WRITE(M) ! call mpas_log_write(M)
+#define STREAM_DEBUG_WRITE(M) call mpas_log_write( M )
During initialization the model reads the files for all input streams. After it reads in a variable it conducts a halo exchange. The error occurs on the first file and first variable. The traceback is below. Sorry, I could not figure out why this was any different than our forward mode configuration, which does not cause an error. I spent 30 minutes on it and decided to stop, because this bug fix prevents the problem and is the same behavior as before #1459.
wf106:init_step2$ mpirun -n 1 /usr/projects/climate/mpeterse/repos/MPAS/ocean_develop/ocean_model
Reported: 1 (out of 1) daemons - 1 (out of 1) procs
Note: MPAS has requested an MPI threading level of MPI_THREAD_MULTIPLE, but
this is not supported by the MPI implementation; a threading level of
MPI_THREAD_SINGLE will be used instead.
forrtl: severe (408): fort: (7): Attempt to use pointer COMMLISTPTR when it is not associated with a target
Image PC Routine Line Source
ocean_model 0000000003EAD220 Unknown Unknown Unknown
ocean_model 000000000376929F mpas_dmpar_mp_mpa 8466 mpas_dmpar.F
ocean_model 0000000003757330 mpas_dmpar_mp_mpa 7760 mpas_dmpar.F
ocean_model 0000000003748A48 mpas_dmpar_mp_mpa 7168 mpas_dmpar.F
ocean_model 00000000037495D4 mpas_dmpar_mp_mpa 7236 mpas_dmpar.F
ocean_model 000000000398F744 mpas_stream_manag 4642 mpas_stream_manager.F
ocean_model 0000000003987890 mpas_stream_manag 3939 mpas_stream_manager.F
ocean_model 0000000003981BD3 mpas_stream_manag 3494 mpas_stream_manager.F
ocean_model 00000000024229E2 ocn_init_mode_mp_ 121 mpas_ocn_init_mode.F
ocean_model 00000000028438CA ocn_core_mp_ocn_c 80 mpas_ocn_core.F
ocean_model 000000000041581D mpas_subdriver_mp 331 mpas_subdriver.F
ocean_model 000000000041066F MAIN__ 14 mpas.F
wf106:init_step2$ tail log.ocean.0000.out
-- Called MPAS_stream_mgr_read()
-- Handling read of stream input_init
-- Stream filename is: mesh.nc
Is field 'latCell' active in stream 'input_init? **
Is field 'lonCell' active in stream 'input_init? **
...
Seeking time of 0001-01-01_00:00:00
WARNING: File mesh.nc does not contain a seekable xtime variable. Forcing a read of the first time record.
-- Exchange halo for latCell
The error is coming from this stream:
<immutable_stream name="input_init"
filename_template="mesh.nc"
input_interval="initial_only"
type="input"/>
Prevent use of commListPtr when it is not associated. This is a bug that was introduced in #1459. It causes an error in MPAS-Ocean only in certain configurations, which is why previous testing did not catch it. It affects MPAS-Ocean init mode, and some E3SM configurations. The fix in this merge only impacts grouped halo exchanges. In testing the error was caused when
exchangeGroup % sendList
was not associated on the very first halo exchange after a file was read in, but only in certain cases.