CICE-Consortium / CICE

Development repository for the CICE sea-ice model
Other
57 stars 131 forks source link

Autoset nprocs and improve max_blocks estimate #949

Closed anton-seaice closed 4 months ago

anton-seaice commented 5 months ago

PR checklist

This change allows nprocs to be set to -1 in 'ice_in' and then the number of processors will be automatically detected.

This change improves the automatic calculation of max_blocks to give a better (but still not foolproof) estimate of max_blocks if it is not set in ice_in.

apcraig commented 5 months ago

This looks pretty good. A couple thoughts.

The modifications I'm working on handle the nprocs thing in much the same way, but also refactor the max_blocks calculation so each proc (mpi task) has a locally defined max_blocks that matches the decomposition exactly. I'm still developing and testing to make sure it'll work properly.

I'm happy to merge this as a first step (if the CESMCOUPLED thing is fixed). I could then update the implementation if I make additional progress. Thoughts?

anton-seaice commented 5 months ago

I'm happy to merge this as a first step (if the CESMCOUPLED thing is fixed). I could then update the implementation if I make additional progress. Thoughts?

Lets go with this. I think I made the related changes to do this.

I guess we should add some tests?

apcraig commented 5 months ago

I will run a full test suite once comments are addressed and the PR is no longer draft. Thanks!

anton-seaice commented 5 months ago

I will run a full test suite once comments are addressed and the PR is no longer draft. Thanks!

My question here is whether we should add some test cases for nprocs=-1. It may be a bit of work though, because most of the scripts are written around setting this explicitly.

apcraig commented 5 months ago

I will run a full test suite once comments are addressed and the PR is no longer draft. Thanks!

My question here is whether we should add some test cases for nprocs=-1. It may be a bit of work though, because most of the scripts are written around setting this explicitly.

Please run a test manually with nprocs=-1 to make sure it works. I think the question is maybe whether we should have nprocs=-1 be the default setup for all tests. You could try that and see if everything runs and is bit-for-bit. I think you'd need to set nprocs=-1 in configuration/scripts/ice_in and then remove "nprocs = ${task}" in cice.setup. Is that the direction we want to go?

anton-seaice commented 4 months ago

Our intermittent failure is back !

https://github.com/CICE-Consortium/CICE/actions/runs/9024916341/job/24799676376 failed https://github.com/ACCESS-NRI/CICE/actions/runs/9024916168/job/24799675998 passed

But they are the same commit

apcraig commented 4 months ago

I restarted the github action failure and it passed. Ugh....