E3SM-Project / E3SM

Energy Exascale Earth System Model source code. NOTE: use "maint" branches for your work. Head of master is not validated.
https://docs.e3sm.org/E3SM
Other
352 stars 364 forks source link

Zero values written in MPAS monthly timeseries files with WC on chrysalis, certain PE layouts, and PIO settings #4174

Open ndkeen opened 3 years ago

ndkeen commented 3 years ago

Our v1 highres control production run (as well as the transient run) that we moved from theta to chrysalis was found to have zeros in the mpaso.hist.am.timeSeriesStatsMonthly.*.nc file. These runs use the maint-1.0 repo and I've verified the issue is the same for a repo of January 2021 as well as March 18th 2021. The files written on Theta do not show the issue. After a fair amount of testing documented here:

https://acme-climate.atlassian.net/wiki/spaces/SIM/pages/1025639219/Control+Run+HighRes+MIP+theta.20190910.branch+noCNT.A+WCYCL1950S+CMIP6+HR.ne120+oRRS18v3+ICG

Im now able to repeat the issue given the script below.

#! /bin/csh                                                                                                                           

#set case = /lcrc/group/e3sm/ac.ndkeen/scratch/chrys/maint10-mar18/v1hires.ne120np4_oRRS18to6v3_ICG.A_WCYCL1950S_CMIP6_HR.n109a.prod-\
unc06g                                                                                                                                
set case = /lcrc/group/e3sm/your_location

create_newcase --case $case --res ne120np4_oRRS18to6v3_ICG --compset A_WCYCL1950S_CMIP6_HR --machine chrysalis --compiler intel --project e3sm

cd $case

xmlchange PIO_VERSION=2
xmlchange PIO_BUFFER_SIZE_LIMIT=134217728
xmlchange PIO_REARR_COMM_MAX_PEND_REQ_COMP2IO=64

xmlchange RUN_STARTDATE=0055-12-01

xmlchange --id CAM_CONFIG_OPTS --append --val=-cosp

xmlchange --id STOP_OPTION --val nmonths
xmlchange --id STOP_N --val 1
xmlchange --id REST_OPTION --val nmonths
xmlchange --id REST_N --val 1
xmlchange --id BUDGETS --val TRUE
xmlchange --id HIST_OPTION --val nyears
xmlchange --id HIST_N --val 1

# use 109 nodes, 64x1                                                                                                                 
xmlchange MAX_MPITASKS_PER_NODE=64
xmlchange MAX_TASKS_PER_NODE=128

xmlchange ATM_NTASKS=5440
xmlchange LND_NTASKS=4672
xmlchange ICE_NTASKS=5120
xmlchange OCN_NTASKS=1536
xmlchange CPL_NTASKS=5440
xmlchange ROF_NTASKS=768

xmlchange ATM_ROOTPE=0
xmlchange LND_ROOTPE=0
xmlchange ICE_ROOTPE=0
xmlchange OCN_ROOTPE=5440
xmlchange CPL_ROOTPE=0
xmlchange ROF_ROOTPE=4672

xmlchange NTHRDS=1

# these aren't important, but just try to be same as run_e3sm                                                                         
xmlchange ESP_NTASKS=1
xmlchange ESP_NTHRDS=1
xmlchange GLC_NTASKS=32
xmlchange GLC_NTHRDS=1
xmlchange WAV_NTASKS=32
xmlchange WAV_NTHRDS=1

xmlchange USER_REQUESTED_QUEUE=compute

cp -p /lcrc/group/e3sm/ac.ndkeen/20180410.A_WCYCL1950_HR.ne120_oRRS18v3_ICG.theta-archive-rest-0136-10-01.unc06/maltrud.streams.seaice SourceMods/src.mpascice/streams.seaice

cat <<EOF >> user_nl_cam                                                                                                              
use_hetfrz_classnuc = .false.                                                                                                         
nhtfrq = 0,-24,-6,-6,-3,-1                                                                                                            
mfilt = 1,30,120,120,240,720                                                                                                          
avgflag_pertape = 'A','A','I','A','I','A'                                                                                             
fincl1 = 'IEFLX','extinct_sw_inp','extinct_lw_bnd7','extinct_lw_inp'                                                                  
fincl2 = 'FLUT','PRECT','U200','V200','U850','V850','Z500','Z200','OMEGA500','UBOT','VBOT','TREFHT','TREFHTMN:M','TREFHTMX:X','QREFHT','TS','PS','TMQ','TUQ','TVQ'                                                                                                         
fincl3 = 'PSL','T200','T500','Z300','Z500','U850','V850','UBOT','VBOT','TREFHT','FLUT','TMQ','TUQ','TVQ'                              
fincl4 = 'FLUT','U200','U850','PRECT','PRECC','OMEGA500','PRECSC','PRECSL'                                                            
fincl5 = 'PRECT:A','PRECC:A'                                                                                                          

cosp_lite = .true.                                                                                                                    
fexcl1 = 'CFAD_SR532_CAL'                                                                                                             
EOF                                                                                                                                   

cat <<EOF >> user_nl_mpaso                                                                                                            
config_am_timeseriesstatsdaily_write_on_startup = .true.                                                                              
EOF 

cat <<EOF >> user_nl_mpaso                                                                                                            
config_am_timeseriesstatsdaily_write_on_startup = .true.                                                                              
EOF                                                                                                                                   

cat <<EOF >>user_nl_mpascice                                                                                                          
config_reuse_halo_exch = true                                                                                                         
config_am_timeseriesstatsdaily_enable = true                                                                                          
EOF                                                                                                                                   

pwd

case.setup >& csout.txt

echo " building"
case.build >& buildout.txt

ls -l bld/*.exe*

./preview_run >& prout.txt
cat prout.txt

echo " submitting"
case.submit -a="-t 3:40:00" >& submitout.txt

The output for the job I ran with this script is here:

/lcrc/group/e3sm/ac.ndkeen/scratch/chrys/maint10-mar18/v1hires.ne120np4_oRRS18to6v3_ICG.A_WCYCL1950S_CMIP6_HR.n109a.prod-unc06g

To verify if a mpas timeseries file has zeros or not, I've found the following command useful (this output shows zeros):

chrlogin2% ncdump -v timeMonthly_avg_activeTracers_temperature run/mpaso.hist.am.timeSeriesStatsMonthly.*.nc | grep -m1 -A3 "data:" | grep ","
  0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 

It may be a combination of the PE layout and the PIO settings:

xmlchange PIO_BUFFER_SIZE_LIMIT=134217728
xmlchange PIO_REARR_COMM_MAX_PEND_REQ_COMP2IO=64

I've also verified that a 58-node layout also exhibits same behavior (zeros in the mpas file). Here is a link to a similar script as above except is uses 58 nodes and is slower. /lcrc/group/e3sm/ac.ndkeen/wacmy/maint10-mar18/cime/scripts/prod-unc06.n058a.csh

And a when I comment the 2 PIO settings, the same scripts works: /lcrc/group/e3sm/ac.ndkeen/wacmy/maint10-mar18/cime/scripts/prod-unc06.n109.piodef.csh It's possible these settings are involved, or, if the root cause is something like memory corruption, then those 2 PIO settings may just be perturbing memory enough to cause different behavior.

I'm trying to further narrow down the issue, but as the issue requires a full month of high res, each job is several hours.

amametjanov commented 3 years ago

pnetcdf on Theta + latest maint-1.0: cray-parallel-netcdf/1.12.0.1

> /opt/cray/pe/parallel-netcdf/1.12.0.1/bin/pnetcdf-config --all

This PnetCDF 1.12.0 was built with the following features:

  --has-c++                   -> yes
  --has-fortran               -> yes
  --netcdf4                   -> disabled
  --adios                     -> disabled
  --relax-coord-bound         -> enabled
  --in-place-swap             -> auto
  --erange-fill               -> enabled
  --subfiling                 -> enabled
  --large-single-req          -> disabled
  --null-byte-header-padding  -> disabled
  --burst-buffering           -> enabled
  --profiling                 -> disabled
  --thread-safe               -> disabled
  --debug                     -> disabled

This PnetCDF 1.12.0 was built using the following compilers and flags:

  --cc            -> cc 
  --cxx           -> CC 
  --f77           -> ftn 
  --fc            -> ftn 
  --cppflags      -> 
  --cflags        -> 
  --cxxflags      -> 
  --fflags        -> 
  --fcflags       -> 
  --ldflags       -> 
  --libs          -> 

This PnetCDF 1.12.0 has been installed under the following directories:

  --prefix        -> /opt/cray/pe/parallel-netcdf/1.12.0.1/INTEL/19.1
  --includedir    -> /opt/cray/pe/parallel-netcdf/1.12.0.1/include
  --libdir        -> /opt/cray/pe/parallel-netcdf/1.12.0.1/INTEL/19.1/lib

Additional information:

  --version       -> PnetCDF 1.12.0
  --release-date  -> September 30, 2019
  --config-date   -> Tue May 19 00:29:21 CDT 2020

pnetcdf on Chrysalis + latest maint-1.0: parallel-netcdf/1.11.0-b74wv4m

$ /gpfs/fs1/soft/chrysalis/spack/opt/spack/linux-centos8-x86_64/intel-20.0.4/parallel-netcdf-1.11.0-b74wv4m/bin/pnetcdf-config --all

This PnetCDF 1.11.0 was built with the following features:

  --has-c++                   -> yes
  --has-fortran               -> yes
  --netcdf4                   -> disabled
  --relax-coord-bound         -> enabled
  --in-place-swap             -> auto
  --erange-fill               -> enabled
  --subfiling                 -> disabled
  --large-single-req          -> disabled
  --null-byte-header-padding  -> disabled
  --burst-buffering           -> disabled
  --profiling                 -> disabled
  --thread-safe               -> disabled
  --debug                     -> disabled

This PnetCDF 1.11.0 was built using the following compilers and flags:

  --cc            -> /gpfs/fs1/soft/chrysalis/spack/opt/spack/linux-centos8-x86_64/intel-20.0.4/intel-mpi-2019.9.304-tkzvizk/compilers_and_libraries_2020.4.304/linux/mpi/intel64/bin/mpiicc
  --cxx           -> /gpfs/fs1/soft/chrysalis/spack/opt/spack/linux-centos8-x86_64/intel-20.0.4/intel-mpi-2019.9.304-tkzvizk/compilers_and_libraries_2020.4.304/linux/mpi/intel64/bin/mpiicpc
  --f77           -> /gpfs/fs1/soft/chrysalis/spack/opt/spack/linux-centos8-x86_64/intel-20.0.4/intel-mpi-2019.9.304-tkzvizk/compilers_and_libraries_2020.4.304/linux/mpi/intel64/bin/mpiifort
  --fc            -> /gpfs/fs1/soft/chrysalis/spack/opt/spack/linux-centos8-x86_64/intel-20.0.4/intel-mpi-2019.9.304-tkzvizk/compilers_and_libraries_2020.4.304/linux/mpi/intel64/bin/mpiifort
  --cppflags      -> 
  --cflags        -> -fPIC
  --cxxflags      -> -fPIC
  --fflags        -> -fPIC
  --fcflags       -> -fPIC
  --ldflags       -> 
  --libs          -> 

This PnetCDF 1.11.0 has been installed under the following directories:

  --prefix        -> /gpfs/fs1/soft/chrysalis/spack/opt/spack/linux-centos8-x86_64/intel-20.0.4/parallel-netcdf-1.11.0-b74wv4m
  --includedir    -> /gpfs/fs1/soft/chrysalis/spack/opt/spack/linux-centos8-x86_64/intel-20.0.4/parallel-netcdf-1.11.0-b74wv4m/include
  --libdir        -> /gpfs/fs1/soft/chrysalis/spack/opt/spack/linux-centos8-x86_64/intel-20.0.4/parallel-netcdf-1.11.0-b74wv4m/lib

Additional information:

  --version       -> PnetCDF 1.11.0
  --release-date  -> 19 Dec 2018
  --config-date   -> Tue Jan  5 05:57:29 CST 2021

There is a more recent version: module load parallel-netcdf/1.12.1-kstkfoc. Parallel-netcdf can also be configured with

  --enable-large-single-req
                          Enable large (> 2 GiB) single request in individual
                          MPI-IO calls. Note some MPI-IO libraries may not
                          support this. [default: disabled]
  --enable-subfiling      Enable subfiling support. [default: disabled]
  --disable-erange-fill   Disable use of fill value when out-of-range type
                          conversion causes NC_ERANGE error. [default:
                          enabled]
jayeshkrishna commented 3 years ago

@ndk I would also recommend trying out pnetcdf 1.12.1 (parallel-netcdf/1.12.1-kstkfoc) and see if it works for the case above.

ndkeen commented 3 years ago

I tried using parallel-netcdf-1.12.1-kstkfoc instead of the default. It did not fix the issue -- still zeros in the file.

casedir: /lcrc/group/e3sm/ac.ndkeen/scratch/chrys/maint10-mar24/v1hires.ne120np4_oRRS18to6v3_ICG.A_WCYCL1950S_CMIP6_HR.n058a.prod-unc06.n058a.pnet12

jayeshkrishna commented 3 years ago

Thanks @ndk, I am able to recreate the issue using the run script (slightly modified - modifications not relevant to the issue) above. I am trying out some experiments and will keep the issue updated.

@ndk, meanwhile was this the smallest PE layout that you could recreate the issue?

ndkeen commented 3 years ago

@jayeshkrishna yes I noted in the original comment that there was also an example of this behavior with 58 node layout. Since this issue is dependent on PE layout, here are the layouts that have failed and worked:

These all show the same issue:

#209 nodes 64x1
MAX_MPITASKS_PER_NODE=64
MAX_TASKS_PER_NODE=128
NTASKS_ATM=10816
ROOTPE_ATM=0
NTASKS_LND=1600
ROOTPE_LND=8192
NTASKS_ICE=9600
ROOTPE_ICE=0
NTASKS_OCN=2560
ROOTPE_OCN=10816
NTASKS_CPL=10816
ROOTPE_CPL=0
NTASKS_ROF=1024
ROOTPE_ROF=9792
NTHREADS=1

#109 nodes 64x1
MAX_MPITASKS_PER_NODE=64
MAX_TASKS_PER_NODE=128
NTASKS_ATM=5440
ROOTPE_ATM=0
NTASKS_LND=4672
ROOTPE_LND=0
NTASKS_ICE=5120
ROOTPE_ICE=0
NTASKS_OCN=1536
ROOTPE_OCN=5440
NTASKS_CPL=5440
ROOTPE_CPL=0
NTASKS_ROF=768
ROOTPE_ROF=4672
NTHREADS=1

#58 nodes 64x1
MAX_MPITASKS_PER_NODE=64
MAX_TASKS_PER_NODE=128
NTASKS_ATM=2752
ROOTPE_ATM=0
NTASKS_LND=1984
ROOTPE_LND=0
NTASKS_ICE=2560
ROOTPE_ICE=0
NTASKS_OCN=960
ROOTPE_OCN=2752
NTASKS_CPL=2752
ROOTPE_CPL=0
NTASKS_ROF=256
ROOTPE_ROF=0
NTHREADS=1

Where the following do not. These are all stacked layouts (which my experiments show perform very well). Note, I've tried many different layouts, but only for 5-day speed tests. These are the only ones I've run for at least a month (which unfortunately may be the only way to see the zero-in-mpas-file error)

#64 nodes stacked 64x1
MAX_MPITASKS_PER_NODE=64
MAX_TASKS_PER_NODE=128
NTASKS_ATM=4096
ROOTPE_ATM=0
NTASKS_LND=4096
ROOTPE_LND=0
NTASKS_ICE=4096
ROOTPE_ICE=0
NTASKS_OCN=4096
ROOTPE_OCN=0
NTASKS_CPL=4096
ROOTPE_CPL=0
NTASKS_ROF=4096
ROOTPE_ROF=0
NTHREADS=1

#128 nodes stacked 32x2
MAX_MPITASKS_PER_NODE=32
MAX_TASKS_PER_NODE=64
NTASKS_ATM=4096
ROOTPE_ATM=0
NTASKS_LND=4096
ROOTPE_LND=0
NTASKS_ICE=4096
ROOTPE_ICE=0
NTASKS_OCN=4096
ROOTPE_OCN=0
NTASKS_CPL=4096
ROOTPE_CPL=0
NTASKS_ROF=4096
ROOTPE_ROF=0
NTHREADS=2
jayeshkrishna commented 3 years ago

Thanks @ndkeen

jayeshkrishna commented 3 years ago

@ndkeen : Can you try the latest master of Scorpio and see if the issue persists?

I tried the case with the version of Scorpio on maint-1.0 (maint-1.0 has v1.0.1) and saw the zero values in the output (timeMonthly_avg_activeTracers_temperature and several other variables in mpaso.hist.am.timeSeriesStatsMonthly.*.nc had all zero values). However the issue is not reproducible with the latest Scorpio master (and most likely v1.2.1 on E3SM master) + maint-1.0 (v1.0.0-266-g092ea1aa3 + scorpio-v1.2.1-11-g4a44ffc4). I tried both the 109 and 58 node cases above, the ones that failed for you, with maint-1.0 + the latest Scorpio master and did not see any apparent issues (no zero values) with the data in the MPAS monthly output. The latest master of Scorpio (and Scorpio v1.2.1 on E3SM master) has several fixes for ultra high resolution simulations that might be related to this case.

To try the latest master of Scorpio,

> cd <E3SM_MAINT-1.0_SOURCE_DIR>
> cd externals/scorpio
> git fetch origin
> git checkout master

The successful cases on Chrysalis,

The 109 node case (maint-1.0 + Scorpio master : v1.0.0-266-g092ea1aa3 + scorpio-v1.2.1-11-g4a44ffc4): /lcrc/group/e3sm/jayesh/scratch/chrys/v1hires.ne120np4_oRRS18to6v3_ICG.A_WCYCL1950S_CMIP6_HR.n109a.prod-unc06g-nodbg-spio-master

The 58 node case (maint-1.0 + Scorpio master : v1.0.0-266-g092ea1aa3 + scorpio-v1.2.1-11-g4a44ffc4): /lcrc/group/e3sm/jayesh/scratch/chrys/v1hires.ne120np4_oRRS18to6v3_ICG.A_WCYCL1950S_CMIP6_HR.n109a.prod-unc06g-nodbg-spio-master-58nodesPE

xylar commented 3 years ago

I should have mentioned this earlier. For testing of standalone MPAS-Ocean, we have a test case where we configure the timeSeriesStatsDaily to have the same output as timeSeriesStatsMonthly that we use for debugging CF-compliant output.

This approach would likely make your debugging easier here, too. Set the necessary namelist options to turn on timeSeriesStatsDaily and make an identical stream in the streams.ocean with the same output but with Monthly --> Daily. I'm not super expert in how to alter the namelists and streams files in E3SM so I'm hoping you can figure that part out.

vanroekel commented 3 years ago

thanks for the great idea @xylar! @ndkeen if you want to try this for your testing to allow for shorter tests (we could run 1 or 2 days instead of a month), I can help you set this up in E3SM. Let me know.

ndkeen commented 3 years ago

Jayesh: OK that's good news that latest scorpio seems to not have the issue. We would need to discuss if replacing this code in the middle of simulation campaign is the right thing to do or not.

Xylar/Luke: Yes, that could help for future testing. We might need to decide what to do with this simulation.

I just made a PR to bring in a bugfix for the ROF restart names (which apparently only happens in certain situations and is fine otherwise... ?). With this fix and a change to PE layout (which is performing better), it looks like the simulation is OK.