Open xylar opened 1 year ago
@sbrus89, this may not be fatal (it the model seems to run okay) but it's disconcerting and should probably be tracked down.
Example output is at:
/lcrc/group/e3sm/ac.xylar/compass_1.2/chrysalis/test_20221219/baseline/hurricane/ocean/hurricane/DEQU120at30cr10rr2/sandy/forward/log.ocean.0000.err
In the more detailed error log, I'm seeing:
PIO: ERROR: Defining variable (ndims = 1) in file pointwiseStats.nc (ncid=24, iotype=PIO_IOTYPE_NETCDF) failed. NetCDF: Name contains illegal characters. NetCDF: Name contains illegal characters (error num=-59), (/lcrc/group/e3sm/ac.xylar/spack_temp/ac.xasay-davis/spack-stage/spack-stage-scorpio-1.3.2-5oym53yysofow5m5ky7ko4lyjnmhviun/spack-src/src/clib/pio_nc.c:3091)
PIO: ERROR: Defining variable (ndims = 1) in file pointwiseStats.nc (ncid=24, iotype=PIO_IOTYPE_NETCDF) failed. NetCDF: Name contains illegal characters. NetCDF: Name contains illegal characters (error num=-59), (/lcrc/group/e3sm/ac.xylar/spack_temp/ac.xasay-davis/spack-stage/spack-stage-scorpio-1.3.2-5oym53yysofow5m5ky7ko4lyjnmhviun/spack-src/src/clib/pio_nc.c:3091)
PIO: ERROR: Defining variable (ndims = 2) in file pointwiseStats.nc (ncid=24, iotype=PIO_IOTYPE_NETCDF) failed. NetCDF: Name contains illegal characters. NetCDF: Name contains illegal characters (error num=-59), (/lcrc/group/e3sm/ac.xylar/spack_temp/ac.xasay-davis/spack-stage/spack-stage-scorpio-1.3.2-5oym53yysofow5m5ky7ko4lyjnmhviun/spack-src/src/clib/pio_nc.c:3091)
See
/lcrc/group/e3sm/ac.xylar/compass_1.2/chrysalis/test_20221219/baseline/hurricane/case_outputs/ocean_hurricane_DEQU120at30cr10rr2_sandy.log
I'm afraid I don't see what illegal character(s) this might be.
It seems like this only occurs on the initial write to the pointwiseStats.nc
file:
Constituent P1
Frequency 0.725229500000000E-04
Amplitude 0.468480000000000E-01
LoveNumbers 0.706000000000000
NodalAmplitude 1.00000000000000
Astronomical argument 0.00000000000000
NodalPhase 1.23913185109541
Type 1
-- Reducing field latCell with nElements = 182
-- Reducing field lonCell with nElements = 182
-- Reducing field ssh with nElements = 182
ERROR: MPAS IO Error: Bad return value from PIO
ERROR: MPAS IO Error: Bad return value from PIO
ERROR: MPAS IO Error: Bad return value from PIO
... Updating 1d real field windSpeedU in stream
... found 1d real named windSpeedU
... done updating field
... Updating 1d real field windSpeedV in stream
... found 1d real named windSpeedV
... done updating field
... Updating 1d real field atmosPressure in stream
... found 1d real named atmosPressure
... done updating field
Doing timestep 2012-10-10_00:00:25
Verifying that cells are not dry...
Minimum thickness is 2267.73554688385.
Done verifying that cells are wet.
Doing timestep 2012-10-10_00:00:50
Verifying that cells are not dry...
Minimum thickness is 2267.73554688386.
Done verifying that cells are wet.
Subsequent writes don't have this:
Verifying that cells are not dry...
Minimum thickness is 2267.73554331533.
Done verifying that cells are wet.
-- Reducing field latCell with nElements = 182
-- Reducing field lonCell with nElements = 182
-- Reducing field ssh with nElements = 182
Doing timestep 2012-10-10_00:30:25
Verifying that cells are not dry...
Minimum thickness is 2267.73554304084.
Done verifying that cells are wet.
In the analysis step, I see:
* step: analysis
compass calling: compass.ocean.tests.hurricane.analysis.Analysis.run()
in /gpfs/fs1/home/ac.xylar/mpas-work/compass/master/compass/ocean/tests/hurricane/analysis/__init__.py
Failed
Exception raised while running the steps of the test case
Traceback (most recent call last):
File "/gpfs/fs1/home/ac.xylar/mpas-work/compass/master/compass/run/serial.py", line 145, in run_tests
_run_test(test_case)
File "/gpfs/fs1/home/ac.xylar/mpas-work/compass/master/compass/run/serial.py", line 394, in _run_test
_run_step(test_case, step, test_case.new_step_log_file)
File "/gpfs/fs1/home/ac.xylar/mpas-work/compass/master/compass/run/serial.py", line 437, in _run_step
step.run()
File "/gpfs/fs1/home/ac.xylar/mpas-work/compass/master/compass/ocean/tests/hurricane/analysis/__init__.py", line 191, in run
data[run] = self.read_pointstats(self.pointstats_file[run])
File "/gpfs/fs1/home/ac.xylar/mpas-work/compass/master/compass/ocean/tests/hurricane/analysis/__init__.py", line 103, in read_pointstats
pointstats_nc.variables['lonCellPointStats'][:])
KeyError: 'lonCellPointStats'
I assume this is related. What do you think?
An ncdump
on the file shows that it doesn't have any useful variables in it:
$ ncdump -h pointwiseStats.nc
netcdf pointwiseStats {
dimensions:
nPoints = 182 ;
StrLen = 64 ;
Time = UNLIMITED ; // (1153 currently)
variables:
int pointCellGlobalID(nPoints) ;
pointCellGlobalID:long_name = "List of global cell IDs in point set." ;
char xtime(Time, StrLen) ;
xtime:long_name = "model time, with format \'YYYY-MM-DD_HH:MM:SS\'" ;
// global attributes:
:model_name = "mpas" ;
...
The run timed out and I had to rerun. Any chance that has something to do with it?
Two potentially relevant things have changed recently. First, I switch the default MPI on Chrysalis to be OpenMPI. Second, I built new Spack environments for compass 1.2.0.alpha3. I think the former is more likely than the latter to be the reason that the problem has just emerged even though there aren't any obviously relevant changes to the test case. Nothing changed regarding SCORPIO in the latest spack build that I can think of, so I don't see why that would be relevant.
It would be easy to build and run with Intel-MPI instead (as I did in my previous tests of hurricane
). I'll try that now but it's getting late.
@sbrus89, no luck with Intel-MPI so that's not the reason. I'm pretty lost what could have changed here to cause this problem.
Seeing the following error messages in MPAS-Ocean log files in hurricane
sandy/forward
step: