Closed aekiss closed 3 years ago
For example the point (474, 2613) is land but unmasked so you could check its value for every field in every restart file in /scratch/v14/pas548/restarts/KEEP/restart356/ice/
or /scratch/x77/aek156/access-om2/archive/01deg_jra55v140_iaf_cycle2_pio_test2/restart356/ice/iced.1986-04-01-00000.nc
and use this as the _FillValue
CF conventions allow for _FillValue
and missing_value
. Is missing_value
is set to something that is non-zero does that help?
http://cfconventions.org/cf-conventions/cf-conventions.html#missing-data
Thanks @nichannah, I'm closing this issue now.
We decided in the 14 Oct TWG meeting that this issue with restarts is not significant enough to warrant fixing, and that a fix with a change to _FillValue
=0 would cause more trouble than it was worth, since genuine data could be misinterpreted as fill.
We just need to remember to fill in the cpu-masked cells with zero values if restarting with a changed cpu layout.
I've done a test run at 0.1deg with PIO (using commit 7c74942) to compare to one without PIO (using commit 26e6159).
This restart issue means I can't compare the restart files, but I've confirmed (using xarray's identical
method) that the outputs are identical, including for a second run based on PIO-generated restarts, so I'm confident that the model state is unaffected by these differences in the restart files. Test script is here: https://github.com/aekiss/notebooks/blob/72986342795e6fef167ad5d9df76a01b1ad7fefa/check_pio.ipynb
Sorry @nic, I'm reopening again - I've hit a bug using PIO in a 1deg configuration.
For the 1deg config I'm using one core per chunk, laid out the same way as slenderX1
(not sure if this is the best choice?)
history_chunksize_x = 15
history_chunksize_y = 300
I have repeated identical runs
/home/156/aek156/payu/testing/all-configs/v2.0.0rc9/1deg_jra55_ryf_v2.0.0rc9
/home/156/aek156/payu/testing/all-configs/v2.0.0rc9/1deg_jra55_ryf_v2.0.0rc9xx
and got differing output in these files and variables:
/scratch/x77/aek156/1deg_jra55_ryf_v2.0.0rc9-CHUCKABLE/output000/ice/OUTPUT/iceh.1900-01.nc
/scratch/x77/aek156/1deg_jra55_ryf_v2.0.0rc9xx-CHUCKABLE/output000/ice/OUTPUT/iceh.1900-01.nc
fsurfn_ai_m
vicen_m
/scratch/x77/aek156/1deg_jra55_ryf_v2.0.0rc9-CHUCKABLE/output000/ice/OUTPUT/iceh.1900-02.nc
/scratch/x77/aek156/1deg_jra55_ryf_v2.0.0rc9xx-CHUCKABLE/output000/ice/OUTPUT/iceh.1900-02.nc
fmelttn_ai_m
vicen_m
/scratch/x77/aek156/1deg_jra55_ryf_v2.0.0rc9-CHUCKABLE/output001/ice/OUTPUT/iceh.1900-04.nc
/scratch/x77/aek156/1deg_jra55_ryf_v2.0.0rc9xx-CHUCKABLE/output001/ice/OUTPUT/iceh.1900-04.nc
aicen_m
flatn_ai_m
fmelttn_ai_m
/scratch/x77/aek156/1deg_jra55_ryf_v2.0.0rc9-CHUCKABLE/output001/ice/OUTPUT/iceh.1900-05.nc
/scratch/x77/aek156/1deg_jra55_ryf_v2.0.0rc9xx-CHUCKABLE/output001/ice/OUTPUT/iceh.1900-05.nc
flatn_ai_m
fsurfn_ai_m
vicen_m
/scratch/x77/aek156/1deg_jra55_ryf_v2.0.0rc9-CHUCKABLE/output002/ice/OUTPUT/iceh.1900-07.nc
/scratch/x77/aek156/1deg_jra55_ryf_v2.0.0rc9xx-CHUCKABLE/output002/ice/OUTPUT/iceh.1900-07.nc
fcondtopn_ai_m
Note that this issue only appears in multi-category variables (e.g. 'fcondtopn_ai_m' (time: 1, nc: 5, nj: 300, ni: 360)
) and is unpredictable - most multi-category variables are ok most of the time, and there are no variables that are always affected.
For example here's category 0 of fmelttn_ai_m
in
/scratch/x77/aek156/1deg_jra55_ryf_v2.0.0rc9xx-CHUCKABLE/output000/ice/OUTPUT/iceh.1900-02.nc
There are bad points just north of the Equator over a limited longitude range in the Indonesian archipelago. They are extremely large, presumably uninitialised values. The values in the longitudes between them are very small but nonzero (they should be zero). The land mask is also messed up.
The problem occurs in different places in other fields.
I've only seen this problem in category 0, but I haven't checked thoroughly. e.g here's category 1 of the same field and file:
I didn't see this issue with the 0.1deg config. Maybe I need better choices for history_chunksize_x
and history_chunksize_y
? (NB I found I could get segfaults if I wasn't careful with these values...)
Oops, apologies @nichannah - this was just because I was calling mpirun
with the wrong options at 1 deg.
When I use
mpirun: --mca io ompio --mca io_ompio_num_aggregators 1
in config.yaml
it works as expected.
The OpenMPI docs say ompio
is the default for versions > 2.x
. Is that incorrect?
On Gadi it appears that romio is used by default. Also we need to specify the number of MPI aggregators explicitly to avoid the heuristic/algorithm that usually sets this. This algorithm appears to get confused with the combination of (chunksize != tile size) and deflation on. The confusion leads to a divide-by-zero. I haven't spent the time to really understand this bug/problem so you could say that --mca io_ompio_num_aggregators 1
is a work-around.
Thanks for the explanation @nichannah
@nichannah FYI: PIO seems to slow down CICE at 1 deg.
see 3-month runs in /home/156/aek156/payu/testing/all-configs/v2.0.0rc9
Fraction of MOM runtime in oasis_recv, Max CICE I/O time (s)
1 deg, no PIO 1deg_jra55_ryf_v2.0.0rc9_nopio
: 0.04, 10.6
1 deg, PIO with 24 chunks (15x300) 1deg_jra55_ryf_v2.0.0rc9_pio
: 0.062, 15.3
1 deg, PIO with 1 chunk (360x300) 1deg_jra55_ryf_v2.0.0rc9_pio_1chunk
: 0.096, 24.4
but it is improved at 0.25deg:
0.25 deg, no PIO 025deg_jra55_ryf_v2.0.0rc9
: 0.078, 54
0.25 deg, PIO with 100 chunks (144x108) 025deg_jra55_ryf_v2.0.0rc9_pio2
: 0.04, 25
The cice cores are spread between nodes on gadi at 1deg with 1+216+24 cores for yatm/mom/cice so that might be part of the problem: https://github.com/COSIMA/access-om2/issues/212 and https://github.com/COSIMA/access-om2/issues/202
I've also tried 1 deg (/home/156/aek156/payu/testing/all-configs/v2.0.0rc10/1deg_jra55_iaf_v2.0.0rc10
) and 0.25 deg (025deg_jra55_iaf_v2.0.0rc10
) configs with 4 chunks (90x300 at 1 deg; 720x540 at 0.25 deg) and get 0.085 for the fraction of MOM runtime in oasis_recv in both cases.
1 deg with 4 chunks is almost as fast as the 24-chunk case (though slower than without PIO) but should be faster to read in most circumstances than 24 chunks. However I'm thinking a 180x150 4-chunk layout is probably a better match to hemisphere-based access patterns so I might try that too. This run was for 5 years, rather than 3mo as in the previous and next posts so I haven't included Max CICE I/O time. It's a bit faster in a 3mo test - see next post.
0.25 deg with 4 chunks is now somewhat slower than without PIO but I'm reluctant to use too many chunks in case it slows down reading. Note that this run was for 2 years, rather than 3mo as in the previous and next posts.
Also I should have mentioned that these 1 deg and 0.25 deg tests all had identical ice outputs, but they differ from the ice outputs in the production 0.1deg runs I reported here so they aren't directly comparable to those.
Some more tests of differing history_chunksize_x
x history_chunksize_y
with 3mo runs at 1 deg in /home/156/aek156/payu/testing/all-configs/v2.0.0rc
:
Fraction of MOM runtime in oasis_recv, Max CICE I/O time (s)
1 deg, PIO with 4 chunks (90x300) 1deg_jra55_iaf_v2.0.0rc10_3mo
: 0.067, 16.8
1 deg, PIO with 4 chunks (180x150) 1deg_jra55_iaf_v2.0.0rc10_3mo_180x150
: 0.072, 18.1
The first of these is slightly faster (presumably because it is consistent with the 15x300 core layout) but the difference is small and so I will use 180x150 for the new 1deg configs as this is better suited to typical access patterns of reading one hemisphere or the other.
The fraction of MOM runtime in oasis_recv values with 90x300 is smaller in the 3 mo case compared to 5yr: 0.067 rather than 0.085 (see prev post). So for 3mo runs the 4-chunk cases (0.067, 0.072) are nearly as fast as the 24-chunk case (0.062) and considerably faster than 1 chunk (0.096) - see post before last.
For future reference: the processor masking in the ice restarts can be fixed with https://github.com/COSIMA/topogtools/blob/master/fix_ice_restarts.py, allowing a change in processor layout during a run.
This issue has been mentioned on ACCESS Hive Community Forum. There might be relevant details there:
https://forum.access-hive.org.au/t/payu-generated-symlinks-dont-work-with-parallelio-library/1617/3
It may be worth trying to compile with parallel IO using PIO (
setenv IO_TYPE pio
).We currently compile CICE with serial IO (
setenv IO_TYPE netcdf
inbld/build.sh
), so one CPU does all the IO and we end up with an Amdahl's law situation that limits the scalability with large core counts.At 0.1 deg CICE is IO-bound when doing daily outputs (see
Timer 12
inice_diag.d
), and the time spent in CICE IO accounts for almost all the time MOM waits for CICE (oasis_recv
inaccess-om2.out
) so the whole coupled model is waiting on one cpu. With daily CICE output at 0.1deg this is ~19% of the model runtime (it's only ~2% without daily CICE output). Lowering the compression level to 1 (https://github.com/COSIMA/cice5/issues/33) has helped (MOM wait was 23% with level 5), and omitting static field output (https://github.com/COSIMA/cice5/issues/32) would also help.Also I understand that PIO doesn't support compression - is that correct?
@russfiedler had these comments on Slack:
Slack discussion: https://arccss.slack.com/archives/C9Q7Y1400/p1557272377089800