CABLE-LSM / CABLE-Trac-archive

Archive CABLE Trac contents as issues
Other
0 stars 0 forks source link

Restructuring CABLE #250

Closed penguian closed 2 years ago

penguian commented 4 years ago

keyword_keepgit owner:jxs599@nci.org.au resolution_fixed type_code improvement | by srb001@csiro.au


Replace #226 and #224.

It was found that MPI GSWP2 failed using this restructured CABLE. Running this as a serial job shows that failure (perhaps in addition to MPI failure) in canopy, dry leaf main loop. Speculation is that it is getting into loop where it previously does not, perhaps due to zenith angle triggering daylight condition, and there is something in there leading to non-convergence of tlfx().

Unpicking this is impossible due to code structure. Most efficient mechanism to debug seems to be to branch off the trunk and being restructuring again, incrementally.


Issue migrated from trac:250 at 2023-11-27 11:32:27 +1100

penguian commented 3 years ago

@jxs599@nci.org.au changed _comment5 which not transferred by tractive

penguian commented 3 years ago

@jxs599@nci.org.au changed _comment6 which not transferred by tractive

penguian commented 3 years ago

@jxs599@nci.org.au changed _comment7 which not transferred by tractive

penguian commented 3 years ago

@jxs599@nci.org.au uploaded file AlbedoDay182Global.gif (16.1 KiB)

Albedo Field from GSWP2 run showing relative difference(%) due to zenith mask

penguian commented 3 years ago

@jxs599@nci.org.au uploaded file SW182Global.gif (58.6 KiB)

Downwards SW @ Day 182

penguian commented 3 years ago

@jxs599@nci.org.au uploaded file SoilTemp182Global.gif (25.9 KiB)

Soil Temperature - surface layer - relative difference assuming SW mask (old) vs Zenith mask (new)

penguian commented 3 years ago

@jxs599@nci.org.au uploaded file SoilMoist182Global.gif (25.6 KiB)

Soil Moisture (surface layer) @ day=182

penguian commented 3 years ago

@jxs599@nci.org.au commented


It has been decided that adjustment to a mask defined by the zenith angle is at this stage unnecessary and so will NOT be employed.

It was found that implementation of this mask condition and ONLY this mask condition led to significant differences that could not be ignored. These only occurred in a few places and are due to the huge increase in the heterogeneity introduced by the ~15K global cells in the GSWP application and NOT the MPI code itself. However, further investigation of the issue suffers from there being a lack of data available. Firstly we only have daily means so it is impossible to say (as expected) that there are only differences at dawn and dusk. Secondly we are running on a minimum of two nodes. It onset appear possible to run MPI on a single node. Analysing data from multiple nodes is troublesome and whilst this has been done previously in the UM-CABLE represents a substantial input of resources. Analysing "all" data from one node simplifies this, but is troublesome because of the amount of output per file and the fact that the cell(s) where these differences occur are not the same every day. This also represents a substantial input of resources.

A comprehensive explanation of this anomaly is likely only going to highlight what we know is already the cause. i.e. changing the mask. It is possible that an inconsistency in use of the mask can be identified, however I suspect it is more likely that we will have to impose arbitrary limits on some quantities. It is worth noting that this is predicated on the assumption that the existing output data is the more correct version.

Following substantial discussion it has been decided that this mask change is not necessary for a CABLE-2 to CABLE-3 transition. In fact it is not necessary in a JAC standalone application either as forced downwards SW is available at t=1. It is only necessary fro calling the radiation code on the first tilmestep of a UM run. and/or potentially a restart. However this last problem we may be able to circumvent by adding SW downwards to the UM dump.

penguian commented 3 years ago

@jxs599@nci.org.au commented


Merging upto 7756 which is as far as we got without MPI and was validated at Tumba, Ampero, Hyytiala, Loobos, etc.

ci all merges in offline/ coupled/. Outstanding here is mods in science/ however we rebase here (fdac294c369af64b682523770ec8c7f629402981) so that we can get ZERO diffs one these remaining mods have been merged. There is a very slight difference here ~O-5 due to dropping an "r_2" declaration in *_albedo.

Here we had to backtrack slightly. In the full merge there was a noted diff in soil temp/moisture over Greenland. Given it seems to be ice points art should be easy enough to figure out. However, rolling back versions building seems to NOT be working properly and picking up changed files - either ONLY or at all. So se have to rollback, rm -fr .tmp and build

So rolling back to working 7783 and implementing incremental changes we find that mods in albedo/ are fine, there appear to be no LAKE tile anyway and so The only difference is replacing air temp with surface/ground temp in calc of soil albedo @ lakes tiles has no effect.

mods in roughness/ however lead to very slight differences.

These are NOT real differences and merely move between using WHERE loops to using DO/IF constructions. Likely the compiler chooses differentt expansions etc in this case which leads to the hcummulative an tiny differences seen in soil temp/moisture.

ReBase at f373f6ef8a58f4e9034c96df035831a74212ce87.

penguian commented 3 years ago

@jxs599@nci.org.au changed _comment0 which not transferred by tractive

penguian commented 3 years ago

@jxs599@nci.org.au changed _comment1 which not transferred by tractive

penguian commented 3 years ago

@jxs599@nci.org.au commented


Whilst I think of it - One major difference that was NOT identified it the "PLUMBER" type testing that has been in the GSWP testing is that in the surface/snow albedo routine we WERE using. the hardwired soil type 9 as a trigger. In the original CABLE-3 version we replaced this with PFT(surface type=17), because this was already made available to us in the JAC version(we knew we would need the veg% parameters for the rhoch calculation and to determine the effective LAI considering potential snow). It turns out that there are some cells where the soil type # ICE is not always consistent with PFT ice, and vice-versa. This is corrected and a part of 7891 in which the sir type is used as the trigger - as it was before, not necessarily because this is more correct.

This aside, the major source of difference from the trunk version (again not encountered with the serial testing) is due to limiting of dq and dq_unsat in cable_canopy.

The finalised version of CABLE-3.0 is at 9625d8df10db479acd0e94fbceebf630835bf2a0. We run a GSWP experiment of this version which we store details locally on Gadi as 7891_svn(RecordedForMyBenefit). We compare this to the same experiment run using the trunk@7530 (CABLE-2.0). We then rollback the dq limiting, the single biggest change , and we run a GSWP experiment of this version which we store details locally on Gadi as 7891_dq(RFMB). We commit this dq-less version to 7892 for posterity and linking but revert this change in 7893.

We compare

penguian commented 3 years ago

@jxs599@nci.org.au changed _comment0 which not transferred by tractive

penguian commented 3 years ago

@jxs599@nci.org.au commented


The finalised version of CABLE-3.0 that checks out in GSWP2(MPI) experiments is @ 9625d8df10db479acd0e94fbceebf630835bf2a0. The 3 changes are:

  1. Change compared to trunk@7530 due to: "dq limiting issue"
  2. Change compared to trunk@7530 due to: "using order of magnitude higher limit in determining vegetated cell for calc of surface reflectance"
  3. Change compared to trunk@7530 due to: "order of operations in computing effective LAI"

All 3 of these modifications are included in 9625d8df10db479acd0e94fbceebf630835bf2a0.

All 3 of these modifications are rolled back in bff597a0bac5683b7237edc58ca0971886013d32. The impact of the combined trivial changes is presented here.

As can be seen these changes are of little consequence. We examine in more detail the 3 more substantial changes presented above. In general these daily means are shown for day 182 in the year long run.

1. dq limiting to positive is a legitimate bug fix which is indeed part of the CMIP6 version of CABLE.

ONLY this change is recorded in 2c5de9d115ff3949f35a20e095ae6b1ad93b4625. The impact of ONLY this change is presented here.

Strikingly, the albedo is significantly affected by this change. We also present the results for day one. The impact on albedo may be slightly less but is still quite large. Recall that this is the daily mean. Investigation into how "dq" feeds back into the albedo at dt > 1 has not been conducted. It is due solely to this code, is acceptable even if not fully understood, and the underlying cause, restricting dq to positive values is more correct and will be adopted.

The percentage change impact on the fluxes is HUGE! As above, this is not fully investigated, nor understood. However, a rudimentary analysis shows that narrowing our focus in on the pixel which seems to have the maximun percentage change, shows that the original flux at that point was very close to zero. A problem encountered previously with the screen temp in Western China. In this instance though it is more difficult to compensate for without pulling apart the netcdf output, as fluxes may indeed be +ve or -ve, and/or close to zero.

2. CABLE includes different treatment(s) of sunlit, shaded and/or vegetated regions. To determine whether a cell is vegetated or not, CABLE compares the effective LAI in the cell with a threshold LAI, above which a cell is considered vegetated. The difference seen here is due to enforcing this same threshold in the particular instance calculating the surface reflectance to diffuse radiation where an arbitrary threshold an order of magnitude lower was being used, for which there is no explanation other than oversight

This change is recorded in ddfea8e0e6505cf6f2b0043e4d922d4b7795bee2, on top of that noted in 2c5de9d115ff3949f35a20e095ae6b1ad93b4625, that is assuming that we include the dq limiting change. The impact of this change on top of that in 2c5de9d115ff3949f35a20e095ae6b1ad93b4625 is presented here.

To differentiate between sunlit, shaded CABLE sums the downward shortwave components and tests whether they are above a certain threshold. For e.g.

sunlit_mask = .false. ! default initialization
where ( SUM (met%fsd) > rad_thresh ) sunlit_mask = .true.

Then throughout the code we use

where(sunlit_mask)
  execute process relevant to sunlit region

Similarly, to differentiate vegetated regions CABLE tests whether the effective LAI is above a certain threshold. For e.g.

veg_mask = .false. ! default initialization
where ( canopy%vlaiw > LAI_thresh ) veg_mask = .true.

Then throughout the code we use

where(veg_mask)
  execute process relevant to vegetated region

Alas, whilst this may be in principle how it ought to work, in reality the trunk@7530 computes these masks (and the combined sunlit_veg_mask) on several occasions and not consistently. CABLE-3.0 includes a mechanism similar to that described above where the masks are evaluated once at the start, and then passed around.

The difference encountered here is due to the vegetated threshold being used being an order of magnitude lower in determining whether a cell is vegetated or not for calc of surface reflectance.

This is inconsistent with the mask used in calculations to this point and there seems to be no explanation for it other than oversight.

3. Change due to: "order of operations in computing effective LAI"

This change is recorded in 0c1fbe94638b619499f99ce24f15e9530f5d6514, on top of that noted in 2c5de9d115ff3949f35a20e095ae6b1ad93b4625, that is assuming that we include the dq limiting change AND using the corrected LAI threshold. The impact of this change on top of that in ddfea8e0e6505cf6f2b0043e4d922d4b7795bee2 is presented here.

These three changes can be viewed in this svn diff between revisions where they are all reinstated.

penguian commented 3 years ago

@jxs599@nci.org.au changed _comment0 which not transferred by tractive

penguian commented 3 years ago

@jxs599@nci.org.au changed _comment1 which not transferred by tractive

penguian commented 3 years ago

@jxs599@nci.org.au changed _comment2 which not transferred by tractive

penguian commented 3 years ago

@jxs599@nci.org.au changed _comment3 which not transferred by tractive

penguian commented 3 years ago

@jxs599@nci.org.au changed _comment4 which not transferred by tractive

penguian commented 3 years ago

@jxs599@nci.org.au changed _comment5 which not transferred by tractive

penguian commented 3 years ago

@jxs599@nci.org.au changed _comment6 which not transferred by tractive

penguian commented 3 years ago

@jxs599@nci.org.au changed _comment7 which not transferred by tractive

penguian commented 3 years ago

@jxs599@nci.org.au changed _comment8 which not transferred by tractive

penguian commented 3 years ago

@jxs599@nci.org.au changed _comment9 which not transferred by tractive

penguian commented 3 years ago

@jxs599@nci.org.au changed _comment10 which not transferred by tractive

penguian commented 3 years ago

@jxs599@nci.org.au changed _comment11 which not transferred by tractive

penguian commented 3 years ago

@jxs599@nci.org.au commented


Abandoning modification 1. dq limiting

Reverting back to bff597a0bac5683b7237edc58ca0971886013d32 @0d3e5845c16fc511342b33f70dd37bc264e1df99. The only TWO relevant changes are 2. and 3. described in the previous comment:24

Comparing 0d3e5845c16fc511342b33f70dd37bc264e1df99 to the trunk@7530 can be found here

2. Change due to: "LAI threshold" used as condition to trigger calculation of surface reflectance to diffuse SW

Comparing a9351cfd16b4da0a7da4c6d25ba822600a491db2 to 0d3e5845c16fc511342b33f70dd37bc264e1df99 can be found here

3. Change due to: "order of operations in computing effective LAI"

Comparing f4f3d563ba8c4a818fd1059ba2e3233246ac11c8 a9351cfd16b4da0a7da4c6d25ba822600a491db2 can be found here

Finally, Comparing f4f3d563ba8c4a818fd1059ba2e3233246ac11c8 to the trunk@7530, effectively comparing CABLE-3.0 TO CABLE-2.0 can be found here

penguian commented 3 years ago

@jxs599@nci.org.au changed _comment0 which not transferred by tractive

penguian commented 3 years ago

@jxs599@nci.org.au changed _comment1 which not transferred by tractive

penguian commented 3 years ago

@jxs599@nci.org.au changed _comment2 which not transferred by tractive

penguian commented 2 years ago

@jxs599@nci.org.au changed status from new to closed

penguian commented 2 years ago

@jxs599@nci.org.au set resolution to fixed

penguian commented 2 years ago

@jxs599@nci.org.au changed milestone from 3. Implementation to 1. Closed

penguian commented 2 years ago

@jxs599@nci.org.au commented


This is mostly done via transition to CABLE3

penguian commented 1 year ago

@ccc561@nci.org.au set keywords to keepgit