ACCESS-NRI / accessdev-Trac-archive

Archive accessdev Trac contents as issues
Apache License 2.0
0 stars 0 forks source link

MOM decomposition #312

Open penguian opened 7 years ago

penguian commented 7 years ago

keyword_MOM keyword_reproducibility | by rb4844

ACCESS-CM2 current configuration assigns 8 x 12 cpus to MOM. Changing this to 12 x 8 works but the surface temperature results are affected.

Issue migrated from trac:312 at 2024-01-31 18:28:31 +1100

penguian commented 7 years ago

rb4844 changed component from ACCESS model to ACCESS-CM2

penguian commented 7 years ago commented

Compiling MOM with fp-precise does not fix the issue.

penguian commented 7 years ago changed _comment0 which not transferred by tractive

penguian commented 7 years ago changed priority from major to minor

penguian commented 7 years ago commented

I believe this is expected behaviour, as collective MPI operations are not bit-reproducible across different processor decompositions (as floating point operations are not commutative, and different decompositions evaluate the individual ranks in different orders)

From Nic's comment it sounded like there is a compile-time option for using reproducible MPI operations

penguian commented 7 years ago commented

Suite u-an301 can do comparisons across different processor decompositions. The MOM build was modified to set the REPRO flag, so build options were

-Duse_netCDF -Duse_netCDF3 -Duse_libMPI -DACCESS -DACCESS_CM  -fpp -Wp,-w  -fno-alias 
-safe-cray-ptr -fpe0 -ftz -assume byterecl -i4 -8e3262e565652ac69b4b02b09b064c4f88b8c8e2 -traceback -nowarn -check noarg_temp_created 
-assume buffered_io -convert big_endian -O2 -debug minimal -no-vec -fp-model precise 

I followed Marshall's suggestion and added




UM and CICE used the same processor decomposition in each run and MOM used 8x12 and 12x8.

The results were different after one day.