E3SM-Project / polaris

Testing and analysis for OMEGA, MPAS-Ocean, MALI and MPAS-Seaice
BSD 3-Clause "New" or "Revised" License
7 stars 13 forks source link

`baroclinic_channel/10km/threads` failing on Perlmutter-cpu with intel-cray compiler #205

Open altheaden opened 4 months ago

altheaden commented 4 months ago

Running the PR suite with intel-cray as the compiler causes the baroclinic_channel/10km/threads test case to fail. The log file is:

polaris calling: polaris.run.serial._run_task()
  in /global/u1/a/althea/code/polaris/update-to-0.4.0-alpha.1/polaris/run/serial.py

Running steps: init, 1thread, 2thread, validate
  * step: init

polaris calling: polaris.ocean.tasks.baroclinic_channel.init.Init.constrain_resources()
  inherited from: polaris.step.Step.constrain_resources()
  in /global/u1/a/althea/code/polaris/update-to-0.4.0-alpha.1/polaris/step.py

polaris calling: polaris.ocean.tasks.baroclinic_channel.init.Init.runtime_setup()
  inherited from: polaris.step.Step.runtime_setup()
  in /global/u1/a/althea/code/polaris/update-to-0.4.0-alpha.1/polaris/step.py

polaris calling: polaris.ocean.tasks.baroclinic_channel.init.Init.run()
  in /global/u1/a/althea/code/polaris/update-to-0.4.0-alpha.1/polaris/ocean/tasks/baroclinic_channel/init.py

cullCell
int32
Running: MpasCellCuller.x /tmp/tmp3_pp_5ql/ds_in.nc /tmp/tmp3_pp_5ql/ds_out.nc

************************************************************
MPAS_CELL_CULLER:
  C++ version
  Remove cells/edges/vertices from a NetCDF MPAS Mesh file. 

  Compiled on Jul  3 2024 at 20:58:35.
************************************************************

Reading input grid.
Read dimensions:
    nCells = 960
    nVertices = 1920
    nEdges = 2880
    vertexDegree = 3
    maxEdges = 6
    Spherical? = 0
    Periodic? = 1
    x_period = 160000
    y_period = 0
Marking cells for removal.
Removing 32 cells.
Marking vertices for removal.
Removing 32 vertices.
Marking edges for removal.
Removing 64 edges.
Writing grid dimensions
Writing grid attributes
Writing grid coordinates
Mapping and writing cell fields and culled_graph.info
Mapping and writing edge fields
Mapping and writing vertex fields
Outputting cell map

Running: MpasMeshConverter.x /tmp/tmp07muiqkb/mesh_in.nc /tmp/tmp07muiqkb/mesh_out.nc

************************************************************
MPAS_MESH_CONVERTER:
  C++ version
  Convert a NetCDF file describing Cell Locations, 
  Vertex Location, and Connectivity into a valid MPAS mesh.

  Compiled on Jul  3 2024 at 20:58:31.
************************************************************

Reading input grid.
Read dimensions:
    nCells = 928
    nVertices = 1888
    vertexDegree = 3
    Spherical? = 0
    Periodic? = 1
    x_period = 160000
    y_period = 0
Built 928 cells.
Built 1888 vertices.
Build prelimiary cell connectivity.
Order vertices on cell.
Build complete cell mask.
Build and order edges, dvEdge, and dcEdge.
Built 2816 edge indices...
Build and order vertex arrays,
Build and order cell arrays,
Build areaCell, areaTriangle, and kiteAreasOnVertex.
     Found 0 incomplete cells. Each is marked with an area of -1.
Build edgesOnEdge and weightsOnEdge.
Build angleEdge.
Building mesh qualities.
    Mesh contains: 0 obtuse triangles.
Writing grid dimensions
Writing grid attributes
Writing grid coordinates
Writing cell connectivity
Writing edge connectivity
Writing vertex connectivity
Writing cell parameters
Writing edge parameters
Writing vertex parameters
Writing mesh qualities
Reading and writing meshDensity
Write graph.info file

          execution:        SUCCESS
          runtime:          0:00:08
  * step: 1thread

polaris calling: polaris.ocean.tasks.baroclinic_channel.forward.Forward.constrain_resources()
  inherited from: polaris.ocean.model.ocean_model_step.OceanModelStep.constrain_resources()
  in /global/u1/a/althea/code/polaris/update-to-0.4.0-alpha.1/polaris/ocean/model/ocean_model_step.py

polaris calling: polaris.ocean.tasks.baroclinic_channel.forward.Forward.runtime_setup()
  inherited from: polaris.model_step.ModelStep.runtime_setup()
  in /global/u1/a/althea/code/polaris/update-to-0.4.0-alpha.1/polaris/model_step.py

Warning: replacing namelist options in namelist.ocean
config_pio_num_iotasks = 1
config_pio_stride = 4
config_dt = '00:05:00'
config_run_duration = '00:15:00'
config_btr_dt = '00:00:15'
Running: gpmetis graph.info 4
******************************************************************************
METIS 5.0 Copyright 1998-13, Regents of the University of Minnesota
 (HEAD: , Built on: Jul 18 2024, 12:00:39)
 size of idx_t: 64bits, real_t: 64bits, idx_t *: 64bits

Graph Information -----------------------------------------------------------
 Name: graph.info, #Vertices: 928, #Edges: 2752, #Parts: 4

Options ---------------------------------------------------------------------
 ptype=kway, objtype=cut, ctype=shem, rtype=greedy, iptype=metisrb
 dbglvl=0, ufactor=1.030, no2hop=NO, minconn=NO, contig=NO, nooutput=NO
 seed=-1, niter=10, ncuts=1

Direct k-way Partitioning ---------------------------------------------------
 - Edgecut: 104, communication volume: 104.

 - Balance:
     constraint #0:  1.009 out of 0.004

 - Most overweight partition:
     pid: 0, actual: 234, desired: 232, ratio: 1.01.

 - Subdomain connectivity: max: 2, min: 1, avg: 1.50

 - Each partition is contiguous.

Timing Information ----------------------------------------------------------
  I/O:                     0.000 sec
  Partitioning:            0.000 sec   (METIS time)
  Reporting:               0.000 sec

Memory Information ----------------------------------------------------------
  Max memory used:         0.307 MB
******************************************************************************

Bypassing step's run() method and running with command line args

polaris calling: polaris.parallel.run_command()
  in /global/u1/a/althea/code/polaris/update-to-0.4.0-alpha.1/polaris/parallel.py

Running: srun -c 1 -N 1 -n 4 ./ocean_model -n namelist.ocean -s streams.ocean
PE 0: MPICH processor detected:
PE 0:   AMD Milan (25:1:1) (family:model:stepping)
MPI VERSION    : CRAY MPICH version 8.1.25.17 (ANL base 3.4a2)
MPI BUILD INFO : Sun Feb 26 16:01 2023 (git hash aecd99f) (CH4)
PE 0: MPICH environment settings =====================================
PE 0:   MPICH_ENV_DISPLAY                              = 1
PE 0:   MPICH_VERSION_DISPLAY                          = 1
PE 0:   MPICH_ABORT_ON_ERROR                           = 0
PE 0:   MPICH_CPUMASK_DISPLAY                          = 0
PE 0:   MPICH_STATS_DISPLAY                            = 0
PE 0:   MPICH_RANK_REORDER_METHOD                      = 1
PE 0:   MPICH_RANK_REORDER_DISPLAY                     = 0
PE 0:   MPICH_MEMCPY_MEM_CHECK                         = 0
PE 0:   MPICH_USE_SYSTEM_MEMCPY                        = 0
PE 0:   MPICH_OPTIMIZED_MEMCPY                         = 1
PE 0:   MPICH_ALLOC_MEM_PG_SZ                          = 4096
PE 0:   MPICH_ALLOC_MEM_POLICY                         = PREFERRED
PE 0:   MPICH_ALLOC_MEM_AFFINITY                       = SYS_DEFAULT
PE 0:   MPICH_MALLOC_FALLBACK                          = 0
PE 0:   MPICH_MEM_DEBUG_FNAME                          = 
PE 0:   MPICH_INTERNAL_MEM_AFFINITY                    = SYS_DEFAULT
PE 0:   MPICH_NO_BUFFER_ALIAS_CHECK                    = 0
PE 0:   MPICH_COLL_SYNC                                = MPI_Bcast
PE 0:   MPICH_SINGLE_HOST_ENABLED                        = 1
PE 0: MPICH/RMA environment settings =================================
PE 0:   MPICH_RMA_MAX_PENDING                          = 128
PE 0:   MPICH_RMA_SHM_ACCUMULATE                       = 0
PE 0: MPICH/Dynamic Process Management environment settings ==========
PE 0:   MPICH_DPM_DIR                                  = 
PE 0:   MPICH_LOCAL_SPAWN_SERVER                       = 0
PE 0:   MPICH_SPAWN_USE_RANKPOOL                       = 1
PE 0: MPICH/SMP environment settings =================================
PE 0:   MPICH_SMP_SINGLE_COPY_MODE                     = XPMEM
PE 0:   MPICH_SMP_SINGLE_COPY_SIZE                     = 8192
PE 0:   MPICH_SHM_PROGRESS_MAX_BATCH_SIZE              = 8
PE 0: MPICH/COLLECTIVE environment settings ==========================
PE 0:   MPICH_COLL_OPT_OFF                             = 0
PE 0:   MPICH_BCAST_ONLY_TREE                          = 1
PE 0:   MPICH_BCAST_INTERNODE_RADIX                    = 4
PE 0:   MPICH_BCAST_INTRANODE_RADIX                    = 4
PE 0:   MPICH_ALLTOALL_SHORT_MSG                       = 64-512
PE 0:   MPICH_ALLTOALL_SYNC_FREQ                       = 1-24
PE 0:   MPICH_ALLTOALLV_THROTTLE                       = 8
PE 0:   MPICH_ALLGATHER_VSHORT_MSG                     = 1024-4096
PE 0:   MPICH_ALLGATHERV_VSHORT_MSG                    = 1024-4096
PE 0:   MPICH_GATHERV_SHORT_MSG                        = 131072
PE 0:   MPICH_GATHERV_MIN_COMM_SIZE                    = 64
PE 0:   MPICH_GATHERV_MAX_TMP_SIZE                     = 536870912
PE 0:   MPICH_GATHERV_SYNC_FREQ                        = 16
PE 0:   MPICH_IGATHERV_MIN_COMM_SIZE                   = 1000
PE 0:   MPICH_IGATHERV_SYNC_FREQ                       = 100
PE 0:   MPICH_IGATHERV_RAND_COMMSIZE                   = 2048
PE 0:   MPICH_IGATHERV_RAND_RECVLIST                   = 0
PE 0:   MPICH_SCATTERV_SHORT_MSG                       = 2048-8192
PE 0:   MPICH_SCATTERV_MIN_COMM_SIZE                   = 64
PE 0:   MPICH_SCATTERV_MAX_TMP_SIZE                    = 536870912
PE 0:   MPICH_SCATTERV_SYNC_FREQ                       = 16
PE 0:   MPICH_SCATTERV_SYNCHRONOUS                     = 0
PE 0:   MPICH_ALLREDUCE_MAX_SMP_SIZE                   = 262144
PE 0:   MPICH_ALLREDUCE_BLK_SIZE                       = 716800
PE 0:   MPICH_GPU_ALLGATHER_VSHORT_MSG_ALGORITHM       = 1
PE 0:   MPICH_GPU_ALLREDUCE_USE_KERNEL                 = 0
PE 0:   MPICH_GPU_COLL_STAGING_BUF_SIZE                = 1048576
PE 0:   MPICH_GPU_ALLREDUCE_STAGING_THRESHOLD          = 256
PE 0:   MPICH_ALLREDUCE_NO_SMP                         = 0
PE 0:   MPICH_REDUCE_NO_SMP                            = 0
PE 0:   MPICH_REDUCE_SCATTER_COMMUTATIVE_LONG_MSG_SIZE = 524288
PE 0:   MPICH_REDUCE_SCATTER_MAX_COMMSIZE              = 1000
PE 0:   MPICH_SHARED_MEM_COLL_OPT                      = 1
PE 0:   MPICH_SHARED_MEM_COLL_NCELLS                   = 8
PE 0:   MPICH_SHARED_MEM_COLL_CELLSZ                   = 256
PE 0: MPICH MPIIO environment settings ===============================
PE 0:   MPICH_MPIIO_HINTS_DISPLAY                      = 0
PE 0:   MPICH_MPIIO_HINTS                              = NULL
PE 0:   MPICH_MPIIO_ABORT_ON_RW_ERROR                  = disable
PE 0:   MPICH_MPIIO_CB_ALIGN                           = 2
PE 0:   MPICH_MPIIO_DVS_MAXNODES                       = 1
PE 0:   MPICH_MPIIO_AGGREGATOR_PLACEMENT_DISPLAY       = 0
PE 0:   MPICH_MPIIO_AGGREGATOR_PLACEMENT_STRIDE        = -1
PE 0:   MPICH_MPIIO_MAX_NUM_IRECV                      = 50
PE 0:   MPICH_MPIIO_MAX_NUM_ISEND                      = 50
PE 0:   MPICH_MPIIO_MAX_SIZE_ISEND                     = 10485760
PE 0:   MPICH_MPIIO_OFI_STARTUP_CONNECT                = disable
PE 0:   MPICH_MPIIO_OFI_STARTUP_NODES_AGGREGATOR        = 2
PE 0: MPICH MPIIO statistics environment settings ====================
PE 0:   MPICH_MPIIO_STATS                              = 0
PE 0:   MPICH_MPIIO_TIMERS                             = 0
PE 0:   MPICH_MPIIO_WRITE_EXIT_BARRIER                 = 1
PE 0: MPICH Thread Safety settings ===================================
PE 0:   MPICH_ASYNC_PROGRESS                           = 0
PE 0:   MPICH_OPT_THREAD_SYNC                          = 1
PE 0:   rank 0 required = funneled, was provided = funneled

          execution:        SUCCESS
          runtime:          0:00:27
  * step: 2thread

polaris calling: polaris.ocean.tasks.baroclinic_channel.forward.Forward.constrain_resources()
  inherited from: polaris.ocean.model.ocean_model_step.OceanModelStep.constrain_resources()
  in /global/u1/a/althea/code/polaris/update-to-0.4.0-alpha.1/polaris/ocean/model/ocean_model_step.py

polaris calling: polaris.ocean.tasks.baroclinic_channel.forward.Forward.runtime_setup()
  inherited from: polaris.model_step.ModelStep.runtime_setup()
  in /global/u1/a/althea/code/polaris/update-to-0.4.0-alpha.1/polaris/model_step.py

Warning: replacing namelist options in namelist.ocean
config_pio_num_iotasks = 1
config_pio_stride = 4
config_dt = '00:05:00'
config_run_duration = '00:15:00'
config_btr_dt = '00:00:15'
Running: gpmetis graph.info 4
******************************************************************************
METIS 5.0 Copyright 1998-13, Regents of the University of Minnesota
 (HEAD: , Built on: Jul 18 2024, 12:00:39)
 size of idx_t: 64bits, real_t: 64bits, idx_t *: 64bits

Graph Information -----------------------------------------------------------
 Name: graph.info, #Vertices: 928, #Edges: 2752, #Parts: 4

Options ---------------------------------------------------------------------
 ptype=kway, objtype=cut, ctype=shem, rtype=greedy, iptype=metisrb
 dbglvl=0, ufactor=1.030, no2hop=NO, minconn=NO, contig=NO, nooutput=NO
 seed=-1, niter=10, ncuts=1

Direct k-way Partitioning ---------------------------------------------------
 - Edgecut: 104, communication volume: 104.

 - Balance:
     constraint #0:  1.009 out of 0.004

 - Most overweight partition:
     pid: 0, actual: 234, desired: 232, ratio: 1.01.

 - Subdomain connectivity: max: 2, min: 1, avg: 1.50

 - Each partition is contiguous.

Timing Information ----------------------------------------------------------
  I/O:                     0.000 sec
  Partitioning:            0.000 sec   (METIS time)
  Reporting:               0.000 sec

Memory Information ----------------------------------------------------------
  Max memory used:         0.307 MB
******************************************************************************

Bypassing step's run() method and running with command line args

polaris calling: polaris.parallel.run_command()
  in /global/u1/a/althea/code/polaris/update-to-0.4.0-alpha.1/polaris/parallel.py

Running with 2 OpenMP threads
Running: srun -c 2 -N 1 -n 4 ./ocean_model -n namelist.ocean -s streams.ocean
PE 0: MPICH processor detected:
PE 0:   AMD Milan (25:1:1) (family:model:stepping)
MPI VERSION    : CRAY MPICH version 8.1.25.17 (ANL base 3.4a2)
MPI BUILD INFO : Sun Feb 26 16:01 2023 (git hash aecd99f) (CH4)
PE 0: MPICH environment settings =====================================
PE 0:   MPICH_ENV_DISPLAY                              = 1
PE 0:   MPICH_VERSION_DISPLAY                          = 1
PE 0:   MPICH_ABORT_ON_ERROR                           = 0
PE 0:   MPICH_CPUMASK_DISPLAY                          = 0
PE 0:   MPICH_STATS_DISPLAY                            = 0
PE 0:   MPICH_RANK_REORDER_METHOD                      = 1
PE 0:   MPICH_RANK_REORDER_DISPLAY                     = 0
PE 0:   MPICH_MEMCPY_MEM_CHECK                         = 0
PE 0:   MPICH_USE_SYSTEM_MEMCPY                        = 0
PE 0:   MPICH_OPTIMIZED_MEMCPY                         = 1
PE 0:   MPICH_ALLOC_MEM_PG_SZ                          = 4096
PE 0:   MPICH_ALLOC_MEM_POLICY                         = PREFERRED
PE 0:   MPICH_ALLOC_MEM_AFFINITY                       = SYS_DEFAULT
PE 0:   MPICH_MALLOC_FALLBACK                          = 0
PE 0:   MPICH_MEM_DEBUG_FNAME                          = 
PE 0:   MPICH_INTERNAL_MEM_AFFINITY                    = SYS_DEFAULT
PE 0:   MPICH_NO_BUFFER_ALIAS_CHECK                    = 0
PE 0:   MPICH_COLL_SYNC                                = MPI_Bcast
PE 0:   MPICH_SINGLE_HOST_ENABLED                        = 1
PE 0: MPICH/RMA environment settings =================================
PE 0:   MPICH_RMA_MAX_PENDING                          = 128
PE 0:   MPICH_RMA_SHM_ACCUMULATE                       = 0
PE 0: MPICH/Dynamic Process Management environment settings ==========
PE 0:   MPICH_DPM_DIR                                  = 
PE 0:   MPICH_LOCAL_SPAWN_SERVER                       = 0
PE 0:   MPICH_SPAWN_USE_RANKPOOL                       = 1
PE 0: MPICH/SMP environment settings =================================
PE 0:   MPICH_SMP_SINGLE_COPY_MODE                     = XPMEM
PE 0:   MPICH_SMP_SINGLE_COPY_SIZE                     = 8192
PE 0:   MPICH_SHM_PROGRESS_MAX_BATCH_SIZE              = 8
PE 0: MPICH/COLLECTIVE environment settings ==========================
PE 0:   MPICH_COLL_OPT_OFF                             = 0
PE 0:   MPICH_BCAST_ONLY_TREE                          = 1
PE 0:   MPICH_BCAST_INTERNODE_RADIX                    = 4
PE 0:   MPICH_BCAST_INTRANODE_RADIX                    = 4
PE 0:   MPICH_ALLTOALL_SHORT_MSG                       = 64-512
PE 0:   MPICH_ALLTOALL_SYNC_FREQ                       = 1-24
PE 0:   MPICH_ALLTOALLV_THROTTLE                       = 8
PE 0:   MPICH_ALLGATHER_VSHORT_MSG                     = 1024-4096
PE 0:   MPICH_ALLGATHERV_VSHORT_MSG                    = 1024-4096
PE 0:   MPICH_GATHERV_SHORT_MSG                        = 131072
PE 0:   MPICH_GATHERV_MIN_COMM_SIZE                    = 64
PE 0:   MPICH_GATHERV_MAX_TMP_SIZE                     = 536870912
PE 0:   MPICH_GATHERV_SYNC_FREQ                        = 16
PE 0:   MPICH_IGATHERV_MIN_COMM_SIZE                   = 1000
PE 0:   MPICH_IGATHERV_SYNC_FREQ                       = 100
PE 0:   MPICH_IGATHERV_RAND_COMMSIZE                   = 2048
PE 0:   MPICH_IGATHERV_RAND_RECVLIST                   = 0
PE 0:   MPICH_SCATTERV_SHORT_MSG                       = 2048-8192
PE 0:   MPICH_SCATTERV_MIN_COMM_SIZE                   = 64
PE 0:   MPICH_SCATTERV_MAX_TMP_SIZE                    = 536870912
PE 0:   MPICH_SCATTERV_SYNC_FREQ                       = 16
PE 0:   MPICH_SCATTERV_SYNCHRONOUS                     = 0
PE 0:   MPICH_ALLREDUCE_MAX_SMP_SIZE                   = 262144
PE 0:   MPICH_ALLREDUCE_BLK_SIZE                       = 716800
PE 0:   MPICH_GPU_ALLGATHER_VSHORT_MSG_ALGORITHM       = 1
PE 0:   MPICH_GPU_ALLREDUCE_USE_KERNEL                 = 0
PE 0:   MPICH_GPU_COLL_STAGING_BUF_SIZE                = 1048576
PE 0:   MPICH_GPU_ALLREDUCE_STAGING_THRESHOLD          = 256
PE 0:   MPICH_ALLREDUCE_NO_SMP                         = 0
PE 0:   MPICH_REDUCE_NO_SMP                            = 0
PE 0:   MPICH_REDUCE_SCATTER_COMMUTATIVE_LONG_MSG_SIZE = 524288
PE 0:   MPICH_REDUCE_SCATTER_MAX_COMMSIZE              = 1000
PE 0:   MPICH_SHARED_MEM_COLL_OPT                      = 1
PE 0:   MPICH_SHARED_MEM_COLL_NCELLS                   = 8
PE 0:   MPICH_SHARED_MEM_COLL_CELLSZ                   = 256
PE 0: MPICH MPIIO environment settings ===============================
PE 0:   MPICH_MPIIO_HINTS_DISPLAY                      = 0
PE 0:   MPICH_MPIIO_HINTS                              = NULL
PE 0:   MPICH_MPIIO_ABORT_ON_RW_ERROR                  = disable
PE 0:   MPICH_MPIIO_CB_ALIGN                           = 2
PE 0:   MPICH_MPIIO_DVS_MAXNODES                       = 1
PE 0:   MPICH_MPIIO_AGGREGATOR_PLACEMENT_DISPLAY       = 0
PE 0:   MPICH_MPIIO_AGGREGATOR_PLACEMENT_STRIDE        = -1
PE 0:   MPICH_MPIIO_MAX_NUM_IRECV                      = 50
PE 0:   MPICH_MPIIO_MAX_NUM_ISEND                      = 50
PE 0:   MPICH_MPIIO_MAX_SIZE_ISEND                     = 10485760
PE 0:   MPICH_MPIIO_OFI_STARTUP_CONNECT                = disable
PE 0:   MPICH_MPIIO_OFI_STARTUP_NODES_AGGREGATOR        = 2
PE 0: MPICH MPIIO statistics environment settings ====================
PE 0:   MPICH_MPIIO_STATS                              = 0
PE 0:   MPICH_MPIIO_TIMERS                             = 0
PE 0:   MPICH_MPIIO_WRITE_EXIT_BARRIER                 = 1
PE 0: MPICH Thread Safety settings ===================================
PE 0:   MPICH_ASYNC_PROGRESS                           = 0
PE 0:   MPICH_OPT_THREAD_SYNC                          = 1
PE 0:   rank 0 required = funneled, was provided = funneled

          execution:        SUCCESS
          runtime:          0:00:08
  * step: validate

polaris calling: polaris.ocean.tasks.baroclinic_channel.validate.Validate.constrain_resources()
  inherited from: polaris.step.Step.constrain_resources()
  in /global/u1/a/althea/code/polaris/update-to-0.4.0-alpha.1/polaris/step.py

polaris calling: polaris.ocean.tasks.baroclinic_channel.validate.Validate.runtime_setup()
  inherited from: polaris.step.Step.runtime_setup()
  in /global/u1/a/althea/code/polaris/update-to-0.4.0-alpha.1/polaris/step.py

polaris calling: polaris.ocean.tasks.baroclinic_channel.validate.Validate.run()
  in /global/u1/a/althea/code/polaris/update-to-0.4.0-alpha.1/polaris/ocean/tasks/baroclinic_channel/validate.py

temperature          Time index: 0, 1, 2
0:  l1: 5.32907051820075e-15  l2: 3.07674029821370e-15  linf: 1.77635683940025e-15
1:  l1: 3.55271367880050e-15  l2: 2.51214793389404e-15  linf: 1.77635683940025e-15
2:  l1: 4.26325641456060e-14  l2: 1.06581410364015e-14  linf: 3.55271367880050e-15
  FAIL /pscratch/sd/a/althea/polaris-tests/update-to-0.4.0/pr-intel/ocean/planar/baroclinic_channel/10km/threads/1thread/output.nc
       /pscratch/sd/a/althea/polaris-tests/update-to-0.4.0/pr-intel/ocean/planar/baroclinic_channel/10km/threads/2thread/output.nc
salinity             Time index: 0, 1, 2
0:  l1: 6.39488462184090e-14  l2: 2.13162820728030e-14  linf: 7.10542735760100e-15
1:  l1: 2.27373675443232e-13  l2: 4.01943669423046e-14  linf: 7.10542735760100e-15
2:  l1: 6.60804744256893e-13  l2: 7.21122117759323e-14  linf: 1.42108547152020e-14
  FAIL /pscratch/sd/a/althea/polaris-tests/update-to-0.4.0/pr-intel/ocean/planar/baroclinic_channel/10km/threads/1thread/output.nc
       /pscratch/sd/a/althea/polaris-tests/update-to-0.4.0/pr-intel/ocean/planar/baroclinic_channel/10km/threads/2thread/output.nc
layerThickness       Time index: 0, 1, 2
1:  l1: 7.10542735760100e-15  l2: 7.10542735760100e-15  linf: 7.10542735760100e-15
2:  l1: 6.39488462184090e-14  l2: 3.25611587112166e-14  linf: 2.84217094304040e-14
  FAIL /pscratch/sd/a/althea/polaris-tests/update-to-0.4.0/pr-intel/ocean/planar/baroclinic_channel/10km/threads/1thread/output.nc
       /pscratch/sd/a/althea/polaris-tests/update-to-0.4.0/pr-intel/ocean/planar/baroclinic_channel/10km/threads/2thread/output.nc
normalVelocity       Time index: 0, 1, 2
0:  l1: 3.83984305292565e-16  l2: 1.46704076535723e-17  linf: 3.46944695195361e-18
1:  l1: 3.44834147772799e-15  l2: 7.13150071393156e-17  linf: 1.38777878078145e-17
2:  l1: 4.32360010547823e-14  l2: 3.06666031916816e-15  linf: 1.20736753927986e-15
  FAIL /pscratch/sd/a/althea/polaris-tests/update-to-0.4.0/pr-intel/ocean/planar/baroclinic_channel/10km/threads/1thread/output.nc
       /pscratch/sd/a/althea/polaris-tests/update-to-0.4.0/pr-intel/ocean/planar/baroclinic_channel/10km/threads/2thread/output.nc
          execution:        ERROR
Exception raised while running the steps of the task
Traceback (most recent call last):
  File "/global/u1/a/althea/code/polaris/update-to-0.4.0-alpha.1/polaris/run/serial.py", line 324, in _log_and_run_task
    baselines_passed = _run_task(task, available_resources)
                       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/global/u1/a/althea/code/polaris/update-to-0.4.0-alpha.1/polaris/run/serial.py", line 403, in _run_task
    _run_step(task, step, task.new_step_log_file,
  File "/global/u1/a/althea/code/polaris/update-to-0.4.0-alpha.1/polaris/run/serial.py", line 502, in _run_step
    step.run()
  File "/global/u1/a/althea/code/polaris/update-to-0.4.0-alpha.1/polaris/ocean/tasks/baroclinic_channel/validate.py", line 51, in run
    raise ValueError(f'Validation failed comparing outputs between '
ValueError: Validation failed comparing outputs between 1thread and 2thread.
xylar commented 4 months ago

Thanks @altheaden. As we discussed, I'll run git-bisect on this tomorrow. It looks like thing worked back in https://github.com/E3SM-Project/polaris/pull/177 (at least I checked the box...).

xylar commented 4 months ago

The biset utility was able to track it down to this PR: https://github.com/E3SM-Project/E3SM/pull/6035

I'll make an E3SM bug report about this.