coecms / access-esm

Main Repository for ACCESS-ESM configurations
0 stars 2 forks source link

Where is the source code corresponding to the pre-industrial branch? #13

Open penguian opened 8 months ago

penguian commented 8 months ago

Rather than using the pre-compiled executables, I changed config.yaml for the pre-industrial branch to use the executables built via https://github.com/coecms/access-esm-build-gadi/tree/master :

diff --git a/config.yaml b/config.yaml
index 6d3935f..4ea0bbe 100644
--- a/config.yaml
+++ b/config.yaml
@@ -10,14 +10,14 @@ submodels:
     - name: atmosphere
       model: um
       ncpus: 192
-      exe: /g/data/access/payu/access-esm/bin/coe/um7.3x
+      exe: /g/data/tm70/pcl851/src/coecms/access-esm-build-gadi/bin/um_hg3.exe
       input:
         - /g/data/access/payu/access-esm/input/pre-industrial/atmosphere

     - name: ocean
       model: mom
       ncpus: 180
-      exe: /g/data/access/payu/access-esm/bin/coe/mom5xx
+      exe: /g/data/tm70/pcl851/src/coecms/access-esm-build-gadi/bin/mom5xx
       input:
         - /g/data/access/payu/access-esm/input/pre-industrial/ocean/common
         - /g/data/access/payu/access-esm/input/pre-industrial/ocean/pre-industrial
@@ -25,7 +25,7 @@ submodels:
     - name: ice
       model: cice
       ncpus: 12
-      exe: /g/data/access/payu/access-esm/bin/coe/cicexx
+      exe: /g/data/tm70/pcl851/src/coecms/access-esm-build-gadi/bin/cice4.1_access-mct-12p-20240115
       input:
         - /g/data/access/payu/access-esm/input/pre-industrial/ice

The resulting archive/access-esm/restart000/atmosphere/fixed.restart_dump.astart differs in 3528 out of 5358 fields:

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
* (CUMF-II) Module Information *
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

mule       : /g/data/hh5/public/apps/miniconda3/envs/analysis3-23.07/lib/python3.10/site-packages/mule/__init__.py (version 2022.07.1)
um_utils   : /g/data/hh5/public/apps/miniconda3/envs/analysis3-23.07/lib/python3.10/site-packages/um_utils/__init__.py (version 2022.07.1)
um_packing : /g/data/hh5/public/apps/miniconda3/envs/analysis3-23.07/lib/python3.10/site-packages/um_packing/__init__.py (version 2022.07.1) (packing lib from SHUMlib: 2023061)

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
* CUMF-II Comparison Report *
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

File 1: archive.build-gadi.1/access-esm/restart000/atmosphere/fixed.restart_dump.astart
File 2: archive.coecms.1/access-esm/restart000/atmosphere/fixed.restart_dump.astart
Files DO NOT compare
  * 0 differences in fixed_length_header (with 7 ignored indices)
  * 3 differences in real_constants (with 0 ignored indices)
  * 3528 field differences, of which 3528 are in data

Compared 5358/5358 fields, with 1830 matches

Maximum RMS diff as % of data in file 1: 1728.247530970832  (field 1939)
Maximum RMS diff as % of data in file 2: 1540966.1195676462 (field 2146)

%%%%%%%%%%%%%%%%%%
* real_constants *
%%%%%%%%%%%%%%%%%%
Components DO NOT compare (compared 38/38 values)
Component differences:
  Index 18 (mean_diabatic_flux) differs - file_1: 1.1837522069452046e+16  file_2: 1.1564718318975124e+16
  Index 20 (energy)             differs - file_1:  1.296698556808685e+24  file_2: 1.2972116950953004e+24
  Index 21 (energy_drift)       differs - file_1:  3.235504453499748e-07  file_2: 3.6300878600888835e-07

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
* Field 1/5358 - U COMPNT OF WIND AFTER TIMESTEP *
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
Lookup compares, data DOES NOT compare
Compared 64/64 lookup values.
File_1 lookup info:
  t1(0102/01/01 00:00:01)  lblev(1)/blev(9.9982061118072)  lbproc(0)
Data differences:
  Number of point differences  : 27840/27840
  Maximum absolute difference  : 40.503974864469633
  RMS difference               : 5.9704684001465225
  RMS diff as % of file_1 data : 113.21319232395794
  RMS diff as % of file_2 data : 95.945432469935682
[...]
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
* Field 5358/5358 - Height at Tropopause Level *
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
Lookup compares, data DOES NOT compare
Compared 64/64 lookup values.
File_1 lookup info:
  t1(0101/12/01 00:00:335)  lblev(0)/blev(-1.0)  lbproc(128)
Data differences:
  Number of point differences  : 27730/27840
  Maximum absolute difference  : 3178.9881505981466
  RMS difference               : 521.75065971373635
  RMS diff as % of file_1 data : 4.0601309452938192
  RMS diff as % of file_2 data : 4.0515202394169556
penguian commented 8 months ago

Using the strings command, I can find the following differences in source code directories between um7.3x executable used by the pre-industrial configuration and the um_hg3 executable created by https://github.com/coecms/access-esm-build-gadi/tree/master :

um7.3x                                                                        | um_hg3
                                                                              > /apps/intel-ct/2019.3.199/mkl/lib/intel64
                                                                              > /lib64/ld-linux-x86-64.so.2
/projects/access/apps/fcm/2019.09.0/lib                                       <
/g/data/p66/pbd562/build/fcm-2019.09.0/lib                                    | /g/data/access/projects/access/apps/fcm/2019.09.0/lib
/g/data/p66/pbd562/projects/access/apps/dummygrib/lib                         | /g/data/tm70/pcl851/src/access-esm-build-gadi/lib/dummygrib
/g/data/p66/pbd562/test/t47-hxw/jan20/4.0.2/gcom/preprocess/src/gcom          | /home/599/mrd599/cylc-run/vn7.0_nci_gadi/share/nci_gadi_ifort_mpp/preprocess/src/gcom
/g/data/p66/pbd562/test/t47-hxw/jan20/4.0.2/oasis3-mct/lib                    | /home/599/mrd599/cylc-run/u-bp124/share/oasis3-mct_local/lib
/g/data/p66/pbd562/test/t47-hxw/jan20/4.0.2/oasis3-mct/Linux/lib              <
/scratch/p66/txz599/UM/UM_ACCESS-ESM1p5_r343/submodels/UM/ummodel_hg3/ppsrc   | /g/data/tm70/pcl851/src/access-esm-build-gadi/src/UM/ummodel_hg3/ppsrc
penguian commented 8 months ago

Questions:

  1. In which repositories and branches can I find the source code, Makefiles, etc. that were used to build
    exe: /g/data/access/payu/access-esm/bin/coe/um7.3x
    exe: /g/data/access/payu/access-esm/bin/coe/mom5xx
    exe: /g/data/access/payu/access-esm/bin/coe/cicexx
  2. Why do the executables built by https://github.com/coecms/access-esm-build-gadi/tree/master differ from these?
HoWol76 commented 8 months ago

I'm investigating this now. Can you tell me what the differences are?

penguian commented 8 months ago

@HoWol76 Do you mean the differences in code or the differences in output?

HoWol76 commented 8 months ago

How do you define differences? What would need to happen for the code to be identical?

penguian commented 8 months ago

I also ran using a UM executable created from Martin Dix's private https://github.com/MartinDix/ESM1.5 repository on my branch https://github.com/penguian/access-esm-build-gadi/tree/build-um-from-MartinDix See https://github.com/penguian/access-esm/tree/pre-industrial-MartinDix

The difference in source code is in qxreconf:

diff -rub ACCESS-NRI/UM_v7/UM/umbase_hg3/src/utility/qxreconf/ereport_mod.F90 MartinDix/ESM1.5/umbase_hg3/src/utility/qxreconf/ereport_mod.F90
--- ACCESS-NRI/UM_v7/UM/umbase_hg3/src/utility/qxreconf/ereport_mod.F90 2024-02-20 11:16:08.000000000 +1100
+++ MartinDix/ESM1.5/umbase_hg3/src/utility/qxreconf/ereport_mod.F90    2024-02-20 11:13:23.000000000 +1100
@@ -50,7 +50,7 @@
   Integer, Intent( InOut )          :: ErrorStatus

 ! Local scalars
-  Integer, Parameter            :: unset = -99
+  Integer           :: flush_code
   Character (Len=*), Parameter  :: astline = '************************&
   &*****************************************************'
   Character (Len=*), Parameter  :: msg ='Job Aborted from Ereport'
@@ -76,8 +76,8 @@
     Write (6,*) astline

 ! DEPENDS ON: um_fort_flush
-    Call Um_Fort_Flush(6, unset)
-    Call Um_Fort_Flush(0, unset)
+    Call Um_Fort_Flush(6, flush_code)
+    Call Um_Fort_Flush(0, flush_code)

     ! On T3E use Cray abort
 #if defined (T3E)
diff -rub ACCESS-NRI/UM_v7/UM/umbase_hg3/src/utility/qxreconf/rcf_calc_len_ancil_mod.F90 MartinDix/ESM1.5/umbase_hg3/src/utility/qxreconf/rcf_calc_len_ancil_mod.F90
--- ACCESS-NRI/UM_v7/UM/umbase_hg3/src/utility/qxreconf/rcf_calc_len_ancil_mod.F90  2024-02-20 11:16:08.000000000 +1100
+++ MartinDix/ESM1.5/umbase_hg3/src/utility/qxreconf/rcf_calc_len_ancil_mod.F90 2024-02-20 11:13:23.000000000 +1100
@@ -119,7 +119,8 @@
       N_Pseudo_Levs = Recondat_Node % Recondat_Info % RPLevs
     Else
       ErrorStatus=1
-      Cmessage='StashCode is not a valid prognostic variable'
+      write(Cmessage, '(a,i3,i4)')                                    &
+        'StashCode is not a valid prognostic variable', SectionCode, StashCode
       Call Ereport( RoutineName, ErrorStatus, Cmessage )
     End If

diff -rub ACCESS-NRI/UM_v7/UM/umbase_hg3/src/utility/qxreconf/rcf_set_data_source_mod.F90 MartinDix/ESM1.5/umbase_hg3/src/utility/qxreconf/rcf_set_data_source_mod.F90
--- ACCESS-NRI/UM_v7/UM/umbase_hg3/src/utility/qxreconf/rcf_set_data_source_mod.F90 2024-02-20 11:16:09.000000000 +1100
+++ MartinDix/ESM1.5/umbase_hg3/src/utility/qxreconf/rcf_set_data_source_mod.F90    2024-02-20 11:13:24.000000000 +1100
@@ -341,9 +341,9 @@

     ! Check that Source is now set correctly otherwise, fail
     If ( data_source( i ) % source == Input_Dump ) Then
-      Write ( Cmessage, *) 'Section ',                              &
+      Write ( Cmessage, '(a,i2,a,i4,a)') 'Section ',                &
                            fields_out( i ) % stashmaster % section, &
-                           'Item ',                                 &
+                           ' Item ',                                &
                            fields_out( i ) % stashmaster % item ,   &
                            ' : Required field is not in input dump!'
       ErrorStatus = 30
diff -rub ACCESS-NRI/UM_v7/UM/umbase_hg3/src/utility/qxreconf/rcf_vertical_mod.F90 MartinDix/ESM1.5/umbase_hg3/src/utility/qxreconf/rcf_vertical_mod.F90
--- ACCESS-NRI/UM_v7/UM/umbase_hg3/src/utility/qxreconf/rcf_vertical_mod.F90    2024-02-20 11:16:09.000000000 +1100
+++ MartinDix/ESM1.5/umbase_hg3/src/utility/qxreconf/rcf_vertical_mod.F90   2024-02-20 11:13:24.000000000 +1100
@@ -81,7 +81,7 @@

 ! Local Data
 Character (Len=*), Parameter      :: RoutineName='Rcf_vertical'
-Character (Len=80)                :: Cmessage
+Character (Len=100)                :: Cmessage
 Integer                           :: ErrorStatus
 Integer                           :: i
 Integer                           :: j
@@ -102,8 +102,10 @@
     ! sizes should be the same, but will check
     If ( field_in % level_size /= field_out % level_size .OR. &
          field_in % levels /= field_out % levels ) Then
-      Cmessage = 'No interpolation, but data field sizes/levels are &
-                 &different!'
+      write(cmessage,'(a,2i10,2i4)')                                    &
+        'No interpolation, but data field sizes/levels are different!', &
+        field_in % level_size, field_out % level_size,                  &
+        field_in % levels, field_out % levels
       ErrorStatus = 10
       Call Ereport( RoutineName, ErrorStatus, Cmessage )
     End If
Only in ACCESS-NRI/UM_v7/UM/ummodel_hg3: bin
Only in MartinDix/ESM1.5/: umrecon

According to mule-cumf the restart000/atmosphere/fixed.restart_dump.astart output is bitwise identical between the https://github.com/ACCESS-NRI/UM_v7 and https://github.com/MartinDix/ESM1.5 versions of access-esm-build-gadi/bin/um_hg3.exe:

[pcl851@gadi-login-06 access-esm.3.old]$ cat logs/cumf.build-gadi.1-build-gadi.MartinDix.1.log 
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
* (CUMF-II) Module Information *
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

mule       : /g/data/hh5/public/apps/miniconda3/envs/analysis3-23.10/lib/python3.10/site-packages/mule/__init__.py (version 2022.07.1)
um_utils   : /g/data/hh5/public/apps/miniconda3/envs/analysis3-23.10/lib/python3.10/site-packages/um_utils/__init__.py (version 2022.07.1)
um_packing : /g/data/hh5/public/apps/miniconda3/envs/analysis3-23.10/lib/python3.10/site-packages/um_packing/__init__.py (version 2022.07.1) (packing lib from SHUMlib: 2023061)

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
* CUMF-II Comparison Report *
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

File 1: archive.build-gadi.1/access-esm/restart000/atmosphere/fixed.restart_dump.astart
File 2: archive.build-gadi.MartinDix.1/access-esm/restart000/atmosphere/fixed.restart_dump.astart
Files compare
  * 0 differences in fixed_length_header (with 7 ignored indices)
  * 0 field differences, of which 0 are in data

Compared 5358/5358 fields, with 5358 matches
penguian commented 8 months ago

Note that Makefile in https://github.com/penguian/access-esm-build-gadi/tree/build-um-from-MartinDix contains the line

cp patch/UM_exe_generator-ACCESS1.5 $@/compile/

so the UM_exe_generator-ACCESS1.5 shell script that builds um_hg3.exe comes from https://github.com/penguian/access-esm-build-gadi and not from either UM source code repository.

penguian commented 8 months ago

In contrast, when I run the pre-industrial configuration with the original 'coe' executables /g/data/access/payu/access-esm/bin/coe/um7.3x, etc. the resulting archive.coecms.1/access-esm/restart000/atmosphere/fixed.restart_dump.astart file differs from archive.build-gadi.1/access-esm/restart000/atmosphere/fixed.restart_dump.astart in 3528 out of 5358 fields as shown above.

Without the original source code and scripts that created the /g/data/access/payu/access-esm/bin/coe/* executables, it is difficult to tell what is causing the difference in UM restart000 output. The differences in UM restart output may be caused by a source code or compilation difference in UM, or it may be in CICE, MOM, Oasis3-MCT, GCOM, etc.

penguian commented 7 months ago

I have investigated why, when running the pre-industrial branch configuration, the executables built from https://github.com/coecms/access-esm-build-gadi/tree/master do not produce bitwise identical output when compared to the executables at /g/data/access/payu/access-esm/bin/coe/

Briefly, the build using the default Makefile settings creates an environment.sh file that includes the line

OASIS_MANUAL=False

which causes

module load oasis3-mct-local/ompi.4.0.2

so that the executables are built using the module version of Oasis3-MCT.

I have created the branches

and am running the pre-industrial configuration again, to make sure that the output reproduces the output from the executables at /g/data/access/payu/access-esm/bin/coe/

HoWol76 commented 7 months ago

I was just to update this myself.

The old executables were probably build of code revision 338, as opposed to the most recent 343. The difference is small, a few variables (wresp, thinning) get initialised to 0.0

penguian commented 7 months ago

Thanks. As far as I can tell, the main difference is in Oasis3-MCT. I will need to contact @MartinDix to chase down the source code to compare with https://github.com/penguian/oasis3-mct/tree/new_modules_pbd562

 #%Module

set help            "Oasis3 coupler"
set install-contact "Martin Dix"
set install-date    "2020-01-17"
set url             "https://verc.enes.org/oasis"
set prefix          ~access/apps/oasis3-mct/ompi.4.0.2

conflict            oasis3 oasis3-mct
prereq  openmpi/4.0.2

source              ~access/modules/common
penguian commented 7 months ago

I think I found the source.

$ strings ~access/apps/oasis3-mct/ompi.4.0.2/lib/*.a |grep '^/[a-z]'|cut -d'(' -f1|sort -u|head -n 5
/apps/openmpi/4.0.2/include/Intel
/home/599/mrd599/cylc-run/u-bp124/share/oasis3-mct_local/lib/psmile/src
/home/599/mrd599/cylc-run/u-bp124/share/oasis3-mct_local/lib/psmile/src/mod_oasis_advance.F90
/home/599/mrd599/cylc-run/u-bp124/share/oasis3-mct_local/lib/psmile/src/mod_oasis_auxiliary_routines.F90
/home/599/mrd599/cylc-run/u-bp124/share/oasis3-mct_local/lib/psmile/src/mod_oasis_coupler.F90

In u-bp124 I see:

./suite.rc:svn checkout https://access-svn.nci.org.au/svn/oasis/branches/dev/mrd599/oasis3-mct-errorhandling oasis3-mct_local
penguian commented 7 months ago

There are many source code changes between https://github.com/penguian/oasis3-mct/tree/new_modules_pbd562 and file:///g/data/access/access-svn/oasis/branches/dev/mrd599/oasis3-mct-errorhandling so this is likely to be the cause of differences between executable behaviours.

penguian commented 7 months ago

If you run

svn co file:///g/data/access/access-svn/oasis/branches/dev/mrd599/oasis3-mct-errorhandling oasis3-mct-local
cd oasis3-mct-local
svn log --diff

you will see

[...]

------------------------------------------------------------------------
r42 | hxy599 | 2014-06-26 11:33:11 +1000 (Thu, 26 Jun 2014) | 1 line

update to Oasis2-MCT2.0 branch@r1024
[...]
Index: lib/scrip/src/remap_bicubic.f
===================================================================
--- lib/scrip/src/remap_bicubic.f   (revision 41)
+++ lib/scrip/src/remap_bicubic.f   (revision 42)
@@ -80,7 +80,7 @@
      &    max_iter = 100   ! max iteration count for i,j iteration

       real (kind=dbl_kind), parameter ::
-     &     converge = epsilon(1.0_dbl_kind) ! convergence criterion
+     &     converge = 1.e-10_dbl_kind ! convergence criterion

 !***********************************************************************
[...]

I think that the change to converge would be enough to cause the drift in output values from the pre-industrial configuration that is seen when using the ~access/apps/oasis3-mct/ompi.4.0.2 module as opposed to compiling from https://github.com/penguian/oasis3-mct/tree/new_modules_pbd562