Open penguian opened 8 months ago
Using the strings
command, I can find the following differences in source code directories between um7.3x
executable used by the pre-industrial
configuration and the um_hg3
executable created by https://github.com/coecms/access-esm-build-gadi/tree/master :
um7.3x | um_hg3
> /apps/intel-ct/2019.3.199/mkl/lib/intel64
> /lib64/ld-linux-x86-64.so.2
/projects/access/apps/fcm/2019.09.0/lib <
/g/data/p66/pbd562/build/fcm-2019.09.0/lib | /g/data/access/projects/access/apps/fcm/2019.09.0/lib
/g/data/p66/pbd562/projects/access/apps/dummygrib/lib | /g/data/tm70/pcl851/src/access-esm-build-gadi/lib/dummygrib
/g/data/p66/pbd562/test/t47-hxw/jan20/4.0.2/gcom/preprocess/src/gcom | /home/599/mrd599/cylc-run/vn7.0_nci_gadi/share/nci_gadi_ifort_mpp/preprocess/src/gcom
/g/data/p66/pbd562/test/t47-hxw/jan20/4.0.2/oasis3-mct/lib | /home/599/mrd599/cylc-run/u-bp124/share/oasis3-mct_local/lib
/g/data/p66/pbd562/test/t47-hxw/jan20/4.0.2/oasis3-mct/Linux/lib <
/scratch/p66/txz599/UM/UM_ACCESS-ESM1p5_r343/submodels/UM/ummodel_hg3/ppsrc | /g/data/tm70/pcl851/src/access-esm-build-gadi/src/UM/ummodel_hg3/ppsrc
Questions:
exe: /g/data/access/payu/access-esm/bin/coe/um7.3x
exe: /g/data/access/payu/access-esm/bin/coe/mom5xx
exe: /g/data/access/payu/access-esm/bin/coe/cicexx
I'm investigating this now. Can you tell me what the differences are?
@HoWol76 Do you mean the differences in code or the differences in output?
How do you define differences? What would need to happen for the code to be identical?
I also ran using a UM executable created from Martin Dix's private https://github.com/MartinDix/ESM1.5 repository on my branch https://github.com/penguian/access-esm-build-gadi/tree/build-um-from-MartinDix See https://github.com/penguian/access-esm/tree/pre-industrial-MartinDix
The difference in source code is in qxreconf
:
diff -rub ACCESS-NRI/UM_v7/UM/umbase_hg3/src/utility/qxreconf/ereport_mod.F90 MartinDix/ESM1.5/umbase_hg3/src/utility/qxreconf/ereport_mod.F90
--- ACCESS-NRI/UM_v7/UM/umbase_hg3/src/utility/qxreconf/ereport_mod.F90 2024-02-20 11:16:08.000000000 +1100
+++ MartinDix/ESM1.5/umbase_hg3/src/utility/qxreconf/ereport_mod.F90 2024-02-20 11:13:23.000000000 +1100
@@ -50,7 +50,7 @@
Integer, Intent( InOut ) :: ErrorStatus
! Local scalars
- Integer, Parameter :: unset = -99
+ Integer :: flush_code
Character (Len=*), Parameter :: astline = '************************&
&*****************************************************'
Character (Len=*), Parameter :: msg ='Job Aborted from Ereport'
@@ -76,8 +76,8 @@
Write (6,*) astline
! DEPENDS ON: um_fort_flush
- Call Um_Fort_Flush(6, unset)
- Call Um_Fort_Flush(0, unset)
+ Call Um_Fort_Flush(6, flush_code)
+ Call Um_Fort_Flush(0, flush_code)
! On T3E use Cray abort
#if defined (T3E)
diff -rub ACCESS-NRI/UM_v7/UM/umbase_hg3/src/utility/qxreconf/rcf_calc_len_ancil_mod.F90 MartinDix/ESM1.5/umbase_hg3/src/utility/qxreconf/rcf_calc_len_ancil_mod.F90
--- ACCESS-NRI/UM_v7/UM/umbase_hg3/src/utility/qxreconf/rcf_calc_len_ancil_mod.F90 2024-02-20 11:16:08.000000000 +1100
+++ MartinDix/ESM1.5/umbase_hg3/src/utility/qxreconf/rcf_calc_len_ancil_mod.F90 2024-02-20 11:13:23.000000000 +1100
@@ -119,7 +119,8 @@
N_Pseudo_Levs = Recondat_Node % Recondat_Info % RPLevs
Else
ErrorStatus=1
- Cmessage='StashCode is not a valid prognostic variable'
+ write(Cmessage, '(a,i3,i4)') &
+ 'StashCode is not a valid prognostic variable', SectionCode, StashCode
Call Ereport( RoutineName, ErrorStatus, Cmessage )
End If
diff -rub ACCESS-NRI/UM_v7/UM/umbase_hg3/src/utility/qxreconf/rcf_set_data_source_mod.F90 MartinDix/ESM1.5/umbase_hg3/src/utility/qxreconf/rcf_set_data_source_mod.F90
--- ACCESS-NRI/UM_v7/UM/umbase_hg3/src/utility/qxreconf/rcf_set_data_source_mod.F90 2024-02-20 11:16:09.000000000 +1100
+++ MartinDix/ESM1.5/umbase_hg3/src/utility/qxreconf/rcf_set_data_source_mod.F90 2024-02-20 11:13:24.000000000 +1100
@@ -341,9 +341,9 @@
! Check that Source is now set correctly otherwise, fail
If ( data_source( i ) % source == Input_Dump ) Then
- Write ( Cmessage, *) 'Section ', &
+ Write ( Cmessage, '(a,i2,a,i4,a)') 'Section ', &
fields_out( i ) % stashmaster % section, &
- 'Item ', &
+ ' Item ', &
fields_out( i ) % stashmaster % item , &
' : Required field is not in input dump!'
ErrorStatus = 30
diff -rub ACCESS-NRI/UM_v7/UM/umbase_hg3/src/utility/qxreconf/rcf_vertical_mod.F90 MartinDix/ESM1.5/umbase_hg3/src/utility/qxreconf/rcf_vertical_mod.F90
--- ACCESS-NRI/UM_v7/UM/umbase_hg3/src/utility/qxreconf/rcf_vertical_mod.F90 2024-02-20 11:16:09.000000000 +1100
+++ MartinDix/ESM1.5/umbase_hg3/src/utility/qxreconf/rcf_vertical_mod.F90 2024-02-20 11:13:24.000000000 +1100
@@ -81,7 +81,7 @@
! Local Data
Character (Len=*), Parameter :: RoutineName='Rcf_vertical'
-Character (Len=80) :: Cmessage
+Character (Len=100) :: Cmessage
Integer :: ErrorStatus
Integer :: i
Integer :: j
@@ -102,8 +102,10 @@
! sizes should be the same, but will check
If ( field_in % level_size /= field_out % level_size .OR. &
field_in % levels /= field_out % levels ) Then
- Cmessage = 'No interpolation, but data field sizes/levels are &
- &different!'
+ write(cmessage,'(a,2i10,2i4)') &
+ 'No interpolation, but data field sizes/levels are different!', &
+ field_in % level_size, field_out % level_size, &
+ field_in % levels, field_out % levels
ErrorStatus = 10
Call Ereport( RoutineName, ErrorStatus, Cmessage )
End If
Only in ACCESS-NRI/UM_v7/UM/ummodel_hg3: bin
Only in MartinDix/ESM1.5/: umrecon
According to mule-cumf
the restart000/atmosphere/fixed.restart_dump.astart
output is bitwise identical between the
https://github.com/ACCESS-NRI/UM_v7 and https://github.com/MartinDix/ESM1.5 versions of access-esm-build-gadi/bin/um_hg3.exe
:
[pcl851@gadi-login-06 access-esm.3.old]$ cat logs/cumf.build-gadi.1-build-gadi.MartinDix.1.log
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
* (CUMF-II) Module Information *
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
mule : /g/data/hh5/public/apps/miniconda3/envs/analysis3-23.10/lib/python3.10/site-packages/mule/__init__.py (version 2022.07.1)
um_utils : /g/data/hh5/public/apps/miniconda3/envs/analysis3-23.10/lib/python3.10/site-packages/um_utils/__init__.py (version 2022.07.1)
um_packing : /g/data/hh5/public/apps/miniconda3/envs/analysis3-23.10/lib/python3.10/site-packages/um_packing/__init__.py (version 2022.07.1) (packing lib from SHUMlib: 2023061)
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
* CUMF-II Comparison Report *
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
File 1: archive.build-gadi.1/access-esm/restart000/atmosphere/fixed.restart_dump.astart
File 2: archive.build-gadi.MartinDix.1/access-esm/restart000/atmosphere/fixed.restart_dump.astart
Files compare
* 0 differences in fixed_length_header (with 7 ignored indices)
* 0 field differences, of which 0 are in data
Compared 5358/5358 fields, with 5358 matches
Note that Makefile
in https://github.com/penguian/access-esm-build-gadi/tree/build-um-from-MartinDix contains the line
cp patch/UM_exe_generator-ACCESS1.5 $@/compile/
so the UM_exe_generator-ACCESS1.5
shell script that builds um_hg3.exe
comes from https://github.com/penguian/access-esm-build-gadi and not from either UM source code repository.
In contrast, when I run the pre-industrial
configuration with the original 'coe' executables /g/data/access/payu/access-esm/bin/coe/um7.3x
, etc. the resulting archive.coecms.1/access-esm/restart000/atmosphere/fixed.restart_dump.astart
file differs from archive.build-gadi.1/access-esm/restart000/atmosphere/fixed.restart_dump.astart
in 3528 out of 5358 fields as shown above.
Without the original source code and scripts that created the /g/data/access/payu/access-esm/bin/coe/*
executables, it is difficult to tell what is causing the difference in UM restart000
output. The differences in UM restart output may be caused by a source code or compilation difference in UM, or it may be in CICE, MOM, Oasis3-MCT, GCOM, etc.
I have investigated why, when running the pre-industrial
branch configuration, the executables built from https://github.com/coecms/access-esm-build-gadi/tree/master do not produce bitwise identical output when compared to the executables at /g/data/access/payu/access-esm/bin/coe/
Briefly, the build using the default Makefile
settings creates an environment.sh
file that includes the line
OASIS_MANUAL=False
which causes
module load oasis3-mct-local/ompi.4.0.2
so that the executables are built using the module version of Oasis3-MCT.
I have created the branches
and am running the pre-industrial
configuration again, to make sure that the output reproduces the output from the executables at /g/data/access/payu/access-esm/bin/coe/
I was just to update this myself.
The old executables were probably build of code revision 338, as opposed to the most recent 343. The difference is small, a few variables (wresp
, thinning
) get initialised to 0.0
Thanks. As far as I can tell, the main difference is in Oasis3-MCT. I will need to contact @MartinDix to chase down the source code to compare with https://github.com/penguian/oasis3-mct/tree/new_modules_pbd562
#%Module
set help "Oasis3 coupler"
set install-contact "Martin Dix"
set install-date "2020-01-17"
set url "https://verc.enes.org/oasis"
set prefix ~access/apps/oasis3-mct/ompi.4.0.2
conflict oasis3 oasis3-mct
prereq openmpi/4.0.2
source ~access/modules/common
I think I found the source.
$ strings ~access/apps/oasis3-mct/ompi.4.0.2/lib/*.a |grep '^/[a-z]'|cut -d'(' -f1|sort -u|head -n 5
/apps/openmpi/4.0.2/include/Intel
/home/599/mrd599/cylc-run/u-bp124/share/oasis3-mct_local/lib/psmile/src
/home/599/mrd599/cylc-run/u-bp124/share/oasis3-mct_local/lib/psmile/src/mod_oasis_advance.F90
/home/599/mrd599/cylc-run/u-bp124/share/oasis3-mct_local/lib/psmile/src/mod_oasis_auxiliary_routines.F90
/home/599/mrd599/cylc-run/u-bp124/share/oasis3-mct_local/lib/psmile/src/mod_oasis_coupler.F90
In u-bp124
I see:
./suite.rc:svn checkout https://access-svn.nci.org.au/svn/oasis/branches/dev/mrd599/oasis3-mct-errorhandling oasis3-mct_local
There are many source code changes between https://github.com/penguian/oasis3-mct/tree/new_modules_pbd562 and file:///g/data/access/access-svn/oasis/branches/dev/mrd599/oasis3-mct-errorhandling so this is likely to be the cause of differences between executable behaviours.
If you run
svn co file:///g/data/access/access-svn/oasis/branches/dev/mrd599/oasis3-mct-errorhandling oasis3-mct-local
cd oasis3-mct-local
svn log --diff
you will see
[...]
------------------------------------------------------------------------
r42 | hxy599 | 2014-06-26 11:33:11 +1000 (Thu, 26 Jun 2014) | 1 line
update to Oasis2-MCT2.0 branch@r1024
[...]
Index: lib/scrip/src/remap_bicubic.f
===================================================================
--- lib/scrip/src/remap_bicubic.f (revision 41)
+++ lib/scrip/src/remap_bicubic.f (revision 42)
@@ -80,7 +80,7 @@
& max_iter = 100 ! max iteration count for i,j iteration
real (kind=dbl_kind), parameter ::
- & converge = epsilon(1.0_dbl_kind) ! convergence criterion
+ & converge = 1.e-10_dbl_kind ! convergence criterion
!***********************************************************************
[...]
I think that the change to converge
would be enough to cause the drift in output values from the pre-industrial
configuration that is seen when using the ~access/apps/oasis3-mct/ompi.4.0.2
module as opposed to compiling from
https://github.com/penguian/oasis3-mct/tree/new_modules_pbd562
Rather than using the pre-compiled executables, I changed
config.yaml
for thepre-industrial
branch to use the executables built via https://github.com/coecms/access-esm-build-gadi/tree/master :The resulting
archive/access-esm/restart000/atmosphere/fixed.restart_dump.astart
differs in 3528 out of 5358 fields: