MPAS-Dev / compass

Configuration Of MPAS Setups
Other
11 stars 37 forks source link

PIO error messages in hurricane forward runs #482

Open xylar opened 1 year ago

xylar commented 1 year ago

Seeing the following error messages in MPAS-Ocean log files in hurricane sandy/forward step:

ERROR: MPAS IO Error: Bad return value from PIO
xylar commented 1 year ago

@sbrus89, this may not be fatal (it the model seems to run okay) but it's disconcerting and should probably be tracked down.

xylar commented 1 year ago

Example output is at:

/lcrc/group/e3sm/ac.xylar/compass_1.2/chrysalis/test_20221219/baseline/hurricane/ocean/hurricane/DEQU120at30cr10rr2/sandy/forward/log.ocean.0000.err
xylar commented 1 year ago

In the more detailed error log, I'm seeing:

PIO: ERROR: Defining variable  (ndims = 1) in file pointwiseStats.nc (ncid=24, iotype=PIO_IOTYPE_NETCDF) failed. NetCDF: Name contains illegal characters. NetCDF: Name contains illegal characters (error num=-59), (/lcrc/group/e3sm/ac.xylar/spack_temp/ac.xasay-davis/spack-stage/spack-stage-scorpio-1.3.2-5oym53yysofow5m5ky7ko4lyjnmhviun/spack-src/src/clib/pio_nc.c:3091)
PIO: ERROR: Defining variable  (ndims = 1) in file pointwiseStats.nc (ncid=24, iotype=PIO_IOTYPE_NETCDF) failed. NetCDF: Name contains illegal characters. NetCDF: Name contains illegal characters (error num=-59), (/lcrc/group/e3sm/ac.xylar/spack_temp/ac.xasay-davis/spack-stage/spack-stage-scorpio-1.3.2-5oym53yysofow5m5ky7ko4lyjnmhviun/spack-src/src/clib/pio_nc.c:3091)
PIO: ERROR: Defining variable  (ndims = 2) in file pointwiseStats.nc (ncid=24, iotype=PIO_IOTYPE_NETCDF) failed. NetCDF: Name contains illegal characters. NetCDF: Name contains illegal characters (error num=-59), (/lcrc/group/e3sm/ac.xylar/spack_temp/ac.xasay-davis/spack-stage/spack-stage-scorpio-1.3.2-5oym53yysofow5m5ky7ko4lyjnmhviun/spack-src/src/clib/pio_nc.c:3091)

See

/lcrc/group/e3sm/ac.xylar/compass_1.2/chrysalis/test_20221219/baseline/hurricane/case_outputs/ocean_hurricane_DEQU120at30cr10rr2_sandy.log
xylar commented 1 year ago

I'm afraid I don't see what illegal character(s) this might be.

sbrus89 commented 1 year ago

It seems like this only occurs on the initial write to the pointwiseStats.nc file:

 Constituent P1
   Frequency 0.725229500000000E-04
   Amplitude 0.468480000000000E-01
   LoveNumbers 0.706000000000000
   NodalAmplitude 1.00000000000000
   Astronomical argument 0.00000000000000
   NodalPhase 1.23913185109541
   Type 1 

  -- Reducing field latCell with nElements = 182
  -- Reducing field lonCell with nElements = 182
  -- Reducing field ssh with nElements = 182
ERROR: MPAS IO Error: Bad return value from PIO
ERROR: MPAS IO Error: Bad return value from PIO
ERROR: MPAS IO Error: Bad return value from PIO
 ... Updating 1d real field windSpeedU in stream 
 ... found 1d real named windSpeedU
 ... done updating field
 ... Updating 1d real field windSpeedV in stream 
 ... found 1d real named windSpeedV
 ... done updating field
 ... Updating 1d real field atmosPressure in stream 
 ... found 1d real named atmosPressure
 ... done updating field
 Doing timestep 2012-10-10_00:00:25
 Verifying that cells are not dry... 
 Minimum thickness is 2267.73554688385.
 Done verifying that cells are wet.
 Doing timestep 2012-10-10_00:00:50
 Verifying that cells are not dry... 
 Minimum thickness is 2267.73554688386.
 Done verifying that cells are wet.

Subsequent writes don't have this:

 Verifying that cells are not dry...
 Minimum thickness is 2267.73554331533.
 Done verifying that cells are wet.
  -- Reducing field latCell with nElements = 182
  -- Reducing field lonCell with nElements = 182
  -- Reducing field ssh with nElements = 182
 Doing timestep 2012-10-10_00:30:25
 Verifying that cells are not dry...
 Minimum thickness is 2267.73554304084.
 Done verifying that cells are wet.
xylar commented 1 year ago

In the analysis step, I see:

  * step: analysis

compass calling: compass.ocean.tests.hurricane.analysis.Analysis.run()
  in /gpfs/fs1/home/ac.xylar/mpas-work/compass/master/compass/ocean/tests/hurricane/analysis/__init__.py

      Failed
Exception raised while running the steps of the test case
Traceback (most recent call last):
  File "/gpfs/fs1/home/ac.xylar/mpas-work/compass/master/compass/run/serial.py", line 145, in run_tests
    _run_test(test_case)
  File "/gpfs/fs1/home/ac.xylar/mpas-work/compass/master/compass/run/serial.py", line 394, in _run_test
    _run_step(test_case, step, test_case.new_step_log_file)
  File "/gpfs/fs1/home/ac.xylar/mpas-work/compass/master/compass/run/serial.py", line 437, in _run_step
    step.run()
  File "/gpfs/fs1/home/ac.xylar/mpas-work/compass/master/compass/ocean/tests/hurricane/analysis/__init__.py", line 191, in run
    data[run] = self.read_pointstats(self.pointstats_file[run])
  File "/gpfs/fs1/home/ac.xylar/mpas-work/compass/master/compass/ocean/tests/hurricane/analysis/__init__.py", line 103, in read_pointstats
    pointstats_nc.variables['lonCellPointStats'][:])
KeyError: 'lonCellPointStats'

I assume this is related. What do you think?

xylar commented 1 year ago

An ncdump on the file shows that it doesn't have any useful variables in it:

$ ncdump -h pointwiseStats.nc 
netcdf pointwiseStats {
dimensions:
    nPoints = 182 ;
    StrLen = 64 ;
    Time = UNLIMITED ; // (1153 currently)
variables:
    int pointCellGlobalID(nPoints) ;
        pointCellGlobalID:long_name = "List of global cell IDs in point set." ;
    char xtime(Time, StrLen) ;
        xtime:long_name = "model time, with format \'YYYY-MM-DD_HH:MM:SS\'" ;

// global attributes:
        :model_name = "mpas" ;
...
xylar commented 1 year ago

The run timed out and I had to rerun. Any chance that has something to do with it?

xylar commented 1 year ago

Two potentially relevant things have changed recently. First, I switch the default MPI on Chrysalis to be OpenMPI. Second, I built new Spack environments for compass 1.2.0.alpha3. I think the former is more likely than the latter to be the reason that the problem has just emerged even though there aren't any obviously relevant changes to the test case. Nothing changed regarding SCORPIO in the latest spack build that I can think of, so I don't see why that would be relevant.

It would be easy to build and run with Intel-MPI instead (as I did in my previous tests of hurricane). I'll try that now but it's getting late.

xylar commented 1 year ago

@sbrus89, no luck with Intel-MPI so that's not the reason. I'm pretty lost what could have changed here to cause this problem.