Closed apcraig closed 4 months ago
I still testing, refining the test suite, and updating documentation. But this should represent the code changes I'm proposing. Things are running well. The max_blocks=-1 setting now computes the maximum required blocks on each task and sets the internal max_blocks variable to that value. That means that it uses exactly the amount of memory required and the max_blocks can vary per task. Users can still manually set max_blocks in namelist as before.
Testing results look good. https://github.com/CICE-Consortium/Test-Results/wiki/cice_by_hash_forks#7402dc7f04f98d840890f29f8f02a59f956a8fc2.
This is ready for review and merge.
Could someone do a review on this PR? Would love to get this merged. Then I can start comprehensively testing in preparation for a release. Thanks!
There is a lot here, so I might have missed something. I'm not going to get a chance to test this out until later (after the workshop). I will approve, but just know I might find stuff later once I have tested.
@anton-seaice do you have time to look at this? It's probably after hours there...
I have update the PR based on feedback from @anton-seaice and am running a set of tests just to make sure nothing is broken. Will report results when the testing is done. Thanks @anton-seaice for the comments.
I reran a portion of the test suite with the latest code changes and I think everything is OK. I'll merge once github actions passes and @anton-seaice is happy with the current implementation. Please let me know if anything else needs to be fixed. Thanks!
There just a couple if lines in ice_domain_size
that are not totally consistent now:
max_blocks , & ! max number of blocks per processor
Could be updated
!*** The model will inform the user of the correct
!*** values for the parameter below. A value higher than
!*** necessary will not cause the code to fail, but will
!*** allocate more memory than is necessary. A value that
!*** is too low will cause the code to exit.
!*** A good initial guess is found using
!*** max_blocks = (nx_global/block_size_x)*(ny_global/block_size_y)/
!*** num_procs
Can probably be removed because its covered in the docs ?
There just a couple if lines in
ice_domain_size
that are not totally consistent now:max_blocks , & ! max number of blocks per processor
Could be updated
!*** The model will inform the user of the correct !*** values for the parameter below. A value higher than !*** necessary will not cause the code to fail, but will !*** allocate more memory than is necessary. A value that !*** is too low will cause the code to exit. !*** A good initial guess is found using !*** max_blocks = (nx_global/block_size_x)*(ny_global/block_size_y)/ !*** num_procs
Can probably be removed because its covered in the docs ?
good catch, fixed these.
good catch, fixed these.
I think you still need to push the commit
PR checklist
[X] Please document the changes in detail, including why the changes are made. This will become part of the PR commit log.
Update support for max_blocks=-1. This update computes the blocks required on each MPI task and then sets that as max_blocks if max_blocks=-1 in namelist. This is done in ice_distribution and is a function of the decomposition among other things. Refactor the decomposition computation to defer usage of max_blocks and eliminate the blockIndex array. Update some indentation formatting in ice_distribution.F90.
Modify cice.setup and cice_decomp.csh to set max_blocks=-1 unless it's explicitly defined by the cice.setup -p setting.
Fix a bug in ice_gather_scatter related to zero-ing out of the halo with the field_loc_noupdate setting. This was zero-ing out the blocks extra times and there were no problems as long as max_blocks was the same value on all MPI tasks. With the new implementation of max_blocks=-1, max_blocks can be different values on different MPI tasks. An error was generated and then the implementation was fixed so each block on each task is now zeroed out exactly once.
Update diagnostics related to max_block information. Write out the min and max max_blocks values across MPI tasks.
Add extra allocation/deallocation checks in ice_distribution.F90 and add a function, ice_memusage_allocErr, to ice_memusage.F90 that checks the alloc/dealloc return code, writes an error message, and aborts. This function could be used in other parts of the code as well.
Fix a bug in the io_binary restart output where each task was writing some output when it should have just been the master task.
Update test cases
Update documentation