Update the automated max_blocks calculation

apcraig commented 4 months ago

PR checklist

[X] Short (1 sentence) summary of your PR: Update the automated max_blocks calculation
[X] Developer(s): apcraig, anton
[X] Suggest PR reviewers from list in the column to the right.
[X] Please copy the PR test results link or provide a summary of testing completed below. Testing still underway. Expect full test suite on derecho with intel, gnu, cray to be bit-for-bit.
How much do the PR code changes differ from the unmodified code?
- [X] bit for bit
- [ ] different at roundoff level
- [ ] more substantial
Does this PR create or have dependencies on Icepack or any other models?
- [ ] Yes
- [X] No
Does this PR update the Icepack submodule? If so, the Icepack submodule must point to a hash on Icepack's main branch.
- [ ] Yes
- [X] No
Does this PR add any new test cases?
- [X] Yes, some test cases changed to leverage and test max_blocks=-1 implementation
- [ ] No
Is the documentation being updated? ("Documentation" includes information on the wiki or in the .rst files from doc/source/, which are used to create the online technical docs at https://readthedocs.org/projects/cice-consortium-cice/. A test build of the technical docs will be performed as part of the PR testing.)
- [X] Yes
- [ ] No, does the documentation need to be updated at a later time?
  - [ ] Yes
  - [ ] No
[X] Please document the changes in detail, including why the changes are made. This will become part of the PR commit log.

Update support for max_blocks=-1. This update computes the blocks required on each MPI task and then sets that as max_blocks if max_blocks=-1 in namelist. This is done in ice_distribution and is a function of the decomposition among other things. Refactor the decomposition computation to defer usage of max_blocks and eliminate the blockIndex array. Update some indentation formatting in ice_distribution.F90.

Modify cice.setup and cice_decomp.csh to set max_blocks=-1 unless it's explicitly defined by the cice.setup -p setting.

Fix a bug in ice_gather_scatter related to zero-ing out of the halo with the field_loc_noupdate setting. This was zero-ing out the blocks extra times and there were no problems as long as max_blocks was the same value on all MPI tasks. With the new implementation of max_blocks=-1, max_blocks can be different values on different MPI tasks. An error was generated and then the implementation was fixed so each block on each task is now zeroed out exactly once.

Update diagnostics related to max_block information. Write out the min and max max_blocks values across MPI tasks.

Add extra allocation/deallocation checks in ice_distribution.F90 and add a function, ice_memusage_allocErr, to ice_memusage.F90 that checks the alloc/dealloc return code, writes an error message, and aborts. This function could be used in other parts of the code as well.

Fix a bug in the io_binary restart output where each task was writing some output when it should have just been the master task.

Update test cases

Update documentation

apcraig commented 4 months ago

I still testing, refining the test suite, and updating documentation. But this should represent the code changes I'm proposing. Things are running well. The max_blocks=-1 setting now computes the maximum required blocks on each task and sets the internal max_blocks variable to that value. That means that it uses exactly the amount of memory required and the max_blocks can vary per task. Users can still manually set max_blocks in namelist as before.

apcraig commented 4 months ago

Testing results look good. https://github.com/CICE-Consortium/Test-Results/wiki/cice_by_hash_forks#7402dc7f04f98d840890f29f8f02a59f956a8fc2.

This is ready for review and merge.

apcraig commented 4 months ago

Could someone do a review on this PR? Would love to get this merged. Then I can start comprehensively testing in preparation for a release. Thanks!

dabail10 commented 4 months ago

There is a lot here, so I might have missed something. I'm not going to get a chance to test this out until later (after the workshop). I will approve, but just know I might find stuff later once I have tested.

eclare108213 commented 4 months ago

@anton-seaice do you have time to look at this? It's probably after hours there...

apcraig commented 4 months ago

I have update the PR based on feedback from @anton-seaice and am running a set of tests just to make sure nothing is broken. Will report results when the testing is done. Thanks @anton-seaice for the comments.

apcraig commented 4 months ago

I reran a portion of the test suite with the latest code changes and I think everything is OK. I'll merge once github actions passes and @anton-seaice is happy with the current implementation. Please let me know if anything else needs to be fixed. Thanks!

anton-seaice commented 4 months ago

There just a couple if lines in ice_domain_size that are not totally consistent now:

    max_blocks  , & ! max number of blocks per processor

Could be updated

   !*** The model will inform the user of the correct
   !*** values for the parameter below.  A value higher than
   !*** necessary will not cause the code to fail, but will
   !*** allocate more memory than is necessary.  A value that
   !*** is too low will cause the code to exit.
   !*** A good initial guess is found using
   !*** max_blocks = (nx_global/block_size_x)*(ny_global/block_size_y)/
   !***               num_procs

Can probably be removed because its covered in the docs ?

apcraig commented 4 months ago

There just a couple if lines in ice_domain_size that are not totally consistent now:

    max_blocks  , & ! max number of blocks per processor

Could be updated

   !*** The model will inform the user of the correct
   !*** values for the parameter below.  A value higher than
   !*** necessary will not cause the code to fail, but will
   !*** allocate more memory than is necessary.  A value that
   !*** is too low will cause the code to exit.
   !*** A good initial guess is found using
   !*** max_blocks = (nx_global/block_size_x)*(ny_global/block_size_y)/
   !***               num_procs

Can probably be removed because its covered in the docs ?

good catch, fixed these.

anton-seaice commented 4 months ago

good catch, fixed these.

I think you still need to push the commit

CICE-Consortium / CICE

Update the automated max_blocks calculation #954

PR checklist