CH-Earth / summa

Structure for Unifying Multiple Modeling Alternatives:
http://www.ral.ucar.edu/projects/summa
GNU General Public License v3.0
80 stars 105 forks source link

Bugfix/multi hru per gru restart #516

Closed wknoben closed 2 years ago

wknoben commented 2 years ago

Make sure all the relevant boxes are checked (and only check the box if you actually completed the step):

Relevant code: https://github.com/CH-Earth/summa/blob/fa9adf808229a45085defdc2bb8ef05836b9b3aa/build/source/netcdf/read_icond.f90#L279-L330

Experiment setup:

Regular (non-parallel run) Experiment: Run all 3 GRUs Command: summa.exe -m filemanager.txt Outcome: Values are correctly read from file and stored in the correct HRU in the data structure. Note that ixFile refers to "index of this HRU in the netCDF file (and thus the data we loaded from the file" and that this matches with HRU and iHRU_global.

Print outputs (reformatted for easier reading):

Reading all initial values for scalarSWE from netCDF
GRU | HRU | scalarSWE 
--- | --- | -----------------------
1   | 1   | 4.2714704475282491E-003   
1   | 2   | 3.5504796877794051E-002  
1   | 3   | 0.14091558550377747        
1   | 4   | 2.1170480576837023        
2   | 5   | 0.0000000000000000        
2   | 6   | 2.6283452412565642E-006  
2   | 7   | 0.26871154953569948       
3   | 8   | 0.56689948152237135        
3   | 9   | 1.2983077710459947        
3   | 10  | 8.7946744910014552        

Reading initial values per HRU
iGRU | iHRU | iHRU_global | startGRU | iHRU_local | ixFile | scalarSWE
-----| ---- | ----------- | -------- | ---------- | ------ | -----------------------
1    | 1    | 1           | 1        | 1          | 1      | 4.2714704475282491E-003
1    | 2    | 2           | 1        | 2          | 2      | 3.5504796877794051E-002
1    | 3    | 3           | 1        | 3          | 3      | 0.14091558550377747
1    | 4    | 4           | 1        | 4          | 4      | 2.1170480576837023
2    | 1    | 5           | 1        | 5          | 5      | 0.0000000000000000
2    | 2    | 6           | 1        | 6          | 6      | 2.6283452412565642E-006
2    | 3    | 7           | 1        | 7          | 7      | 0.26871154953569948
3    | 1    | 8           | 1        | 8          | 8      | 0.56689948152237135
3    | 2    | 9           | 1        | 9          | 9      | 1.2983077710459947
3    | 3    | 10          | 1        | 10         | 10     | 8.7946744910014552

Parallel run Experiment: Skip GRU 1, run only GRUs 2 and 3 Command: summa.exe -g 2 2 -m filemanager.txt Outcome: Values are correctly read from file and but incorrectly in data structure. Note that ixFile no longer matches iHRU_global and HRU. Error catching: If snow layers are present in the domain (as they are here in GRU 3, HRU 3), read_icond() will detect something has gone wrong and exit with a message. In cases where the entire domain has the same number of layers, this error will not have been detected by the code.

Print outputs (reformatted for easier reading):

Reading all initial values for scalarSWE from netCDF
GRU | HRU | scalarSWE 
--- | --- | -----------------------
1   | 1   | 4.2714704475282491E-003   
1   | 2   | 3.5504796877794051E-002  
1   | 3   | 0.14091558550377747        
1   | 4   | 2.1170480576837023        
2   | 5   | 0.0000000000000000        
2   | 6   | 2.6283452412565642E-006  
2   | 7   | 0.26871154953569948       
3   | 8   | 0.56689948152237135        
3   | 9   | 1.2983077710459947        
3   | 10  | 8.7946744910014552     

Reading initial values per HRU
iGRU | iHRU | iHRU_global | startGRU | iHRU_local | ixFile | scalarSWE
-----| ---- | ----------- | -------- | ---------- | ------ | -----------------------
1    | 1    | 5           | 2        | 1          | 2      | 3.5504796877794051E-002  
1    | 2    | 6           | 2        | 2          | 3      | 0.14091558550377747
1    | 3    | 7           | 2        | 3          | 4      | 2.1170480576837023
2    | 1    | 8           | 2        | 4          | 5      | 0.0000000000000000
2    | 2    | 9           | 2        | 5          | 6      | 2.6283452412565642E-006
2    | 3    | 10          | 2        | 6          | 7      | 0.26871154953569948

[..]

FATAL ERROR: summa_readRestart/read_icond/data set to the fill value (name='mLayerTemp')

Parallel run after changes to file Experiment: Skip GRU 1, run only GRUs 2 and 3 Command: summa.exe -g 2 2 -m filemanager.txt Outcome: Values are read correctly from file and stored correctly in the data structure. ixFile matches iHRU_global (for obvious reasons, becaues iHRU_global is now used for indexing) and HRU.

Reading all initial values for scalarSWE from netCDF
GRU | HRU | scalarSWE 
--- | --- | -----------------------
1   | 1   | 4.2714704475282491E-003   
1   | 2   | 3.5504796877794051E-002  
1   | 3   | 0.14091558550377747        
1   | 4   | 2.1170480576837023        
2   | 5   | 0.0000000000000000        
2   | 6   | 2.6283452412565642E-006  
2   | 7   | 0.26871154953569948       
3   | 8   | 0.56689948152237135        
3   | 9   | 1.2983077710459947        
3   | 10  | 8.7946744910014552     

Reading initial values per HRU
iGRU | iHRU | iHRU_global | startGRU | iHRU_local | ixFile | scalarSWE
-----| ---- | ----------- | -------- | ---------- | ------ | -----------------------
1    | 1    | 5           | 2        | 1          | 5      | 0.0000000000000000
1    | 2    | 6           | 2        | 2          | 6      | 2.6283452412565642E-006
1    | 3    | 7           | 2        | 3          | 7      | 0.26871154953569948
2    | 1    | 8           | 2        | 4          | 8      | 0.56689948152237135
2    | 2    | 9           | 2        | 5          | 9      | 1.2983077710459947
2    | 3    | 10          | 2        | 6          | 10     | 8.7946744910014552

Additional tests To confirm functioning I:

  1. Ran the whole domain from start to finish, using a 2-month simulation while generating a restart file at the start of the second month (this is the same restart file as used in the tests above);
  2. Ran the second month only, using the restart file from (1) and a variety of runs:
    • Full domain run (summa.exe -m filemanager.txt)
    • Single-GRU parallelization runs (summa.exe -g 1 1 -m filemanager.txt, .. -g 2 1 .., etc.)
    • Double-GRU parallelization runs ('summa.exe -g 1 2 -m filemanager.txt,.. -g 2 2 ..`, etc.)
    • Multi-GRU parallelization runs ('summa.exe -g 1 3 -m filemanager.txt,.. -g 2 3 ..`, etc.)
  3. Compared all results from (2) with the baseline run from (1)

All runs from (2) are identical to the baseline run.

wknoben commented 2 years ago

That refers to this (closed) PR, although perhaps that description could use some work: https://github.com/CH-Earth/summa/pull/502

andywood commented 2 years ago

Looks good to merge.