CABLE-LSM / CABLE-Trac-archive

Archive CABLE Trac contents as issues
Other
0 stars 0 forks source link

Restructuring CABLE #250

Closed penguian closed 2 years ago

penguian commented 4 years ago

keyword_keepgit owner:jxs599@nci.org.au resolution_fixed type_code improvement | by srb001@csiro.au


Replace #226 and #224.

It was found that MPI GSWP2 failed using this restructured CABLE. Running this as a serial job shows that failure (perhaps in addition to MPI failure) in canopy, dry leaf main loop. Speculation is that it is getting into loop where it previously does not, perhaps due to zenith angle triggering daylight condition, and there is something in there leading to non-convergence of tlfx().

Unpicking this is impossible due to code structure. Most efficient mechanism to debug seems to be to branch off the trunk and being restructuring again, incrementally.


Issue migrated from trac:250 at 2023-11-27 11:32:27 +1100

penguian commented 3 years ago

@jxs599@nci.org.au commented


It was found that crash in GSWP2 was entirely due to compiler setting. Furthermore it was found that this "BUG" has been part of the trunk for at least 6 months.

Ticket fixing this bug!

penguian commented 3 years ago

@jxs599@nci.org.au commented


250 - comment 2

Initial use of JAC-ed version shows differences in fluxes. So in the interest of pushing this restructuring we are pressing forward with the trivial re-structuring of directories and divivding files. This code can be found here:

Plots across flux sites in this Modelling (CABLE) evapotranspiration paper

penguian commented 3 years ago

@jxs599@nci.org.au changed _comment0 which not transferred by tractive

penguian commented 3 years ago

@jxs599@nci.org.au commented


Following this trivial restructuring we can begin an incremental mergeing of the rewritten code here

penguian commented 3 years ago

@jxs599@nci.org.au commented


Differences in fluxes noted in previous comments, were it seems due to use of older PFT parameters. When updated to the newer parameters the remaining differences were more in line with what would be expected with changing the "sunlit" condition to trigger off zenith angle - rather than SW.

The code in which the NCI/MOSRS versions are identical, besides necessary mapping of the code is @:

https://trac.nci.org.au/trac/cable/browser/branches/Share/test_jxs599

We will now work out the best way to push this onto the trunk. Perhaps incrementally, perhaps all in one go.

penguian commented 3 years ago

@jxs599@nci.org.au commented


See also tickets -

    #252    longitude/latitude

#260    Makefiles

#254    Track variables fudged in JAC version that need to be revisited

#255    LAI and canopy height used in JAC

#263    Computing the effective LAI as seen by SW radiation (cbl_lai_eff.F90)

#256    Prognostic bank JAC readiness   new minor   6. Report   JAC readiness

#261    cable_types_mod.F90 in CABLE-3.0    model improvement   new minor   6. Report   model

#262    cable_runtime_opts_mod.F90 - cable.nml  model improvement   new minor   6. Report   model

#257    init_respiration    JAC readiness   new on the list 6. Report   JAC readiness

#258    use consistent tolerance for radiation threshold    model improvement   new on the list 6. Report   model

#259    Utilize masks defined at initialisation for vegetated, sunlit, and both cells.  model improvement   new on the list 
penguian commented 3 years ago

@jxs599@nci.org.au changed _comment0 which not transferred by tractive

penguian commented 3 years ago

@jxs599@nci.org.au _uploaded file README_F90_files_diff_tmp_test_jxs599_VS_trunk (13.7 KiB)_

F90 File differences between restructure CABLE and trunk@7530 (HEAD@September2020)

penguian commented 3 years ago

@jxs599@nci.org.au commented


More comprehensively document differences from the trunk.

The trunk is currently ar 516aafe393e159a1fe2d36044f8f3d687d40687a:

Our branch that restructures CABLE incrementally, highlighting differences is here:

This is the first part of the log, upto c28bdefd5a7b2c8193074531c47c8b3efc6b3c49

Note at every stage, revision (upto c28bdefd5a7b2c8193074531c47c8b3efc6b3c49), the development branch reproduces the trunk exactly at a range of sites.

Plots are arranged here per revision.

Shown here is that at 71ba47d86338026333b9b77263eda9095c125864 to ed80a03932fed88aceaa86f164b6976f92d2cec4, ONLY change is as indicated by revision log.

Analytically, this difference should not occur.

We define: FracOfCanopyAboveSnow = HeightAboveSnow/ MAX( 0.01, Hgt_PFT)

We run two versions of the model with this statement:

reducedLAIdue2snow = LAI_PFT * HeightAboveSnow/ MAX( 0.01, Hgt_PFT)

and also substitute as so:

reducedLAIdue2snow = LAI_PFT * FracOfCanopyAboveSnow

More simply this is like:

z = c/d

a =bc/d*

e = b z*

BUT, running with both versions suggests:

a is NOT equal to e !!

Hence at this point the NEW, improved and indeed necessary (for JAC albedo pathway) calc of effective LAI via a distinct subr will be adopted and the "base" for comparisson will be adjusted.

penguian commented 3 years ago

@jxs599@nci.org.au changed _comment0 which not transferred by tractive

penguian commented 3 years ago

@jxs599@nci.org.au changed _comment1 which not transferred by tractive

penguian commented 3 years ago

@jxs599@nci.org.au changed _comment2 which not transferred by tractive

penguian commented 3 years ago

@jxs599@nci.org.au changed _comment3 which not transferred by tractive

penguian commented 3 years ago

@jxs599@nci.org.au changed _comment4 which not transferred by tractive

penguian commented 3 years ago

@jxs599@nci.org.au commented


The trunk is currently ar 516aafe393e159a1fe2d36044f8f3d687d40687a: Our branch that restructures CABLE incrementally, highlighting differences is here:

Discussion of revisions (upto ed80a03932fed88aceaa86f164b6976f92d2cec4) concerning comparison between the trunk@516aafe393e159a1fe2d36044f8f3d687d40687a and the development are discussed in the previous comment 6.

Incremental steps in bringing the trunk into line with CABLE-3.0 show that revisions between ed80a03932fed88aceaa86f164b6976f92d2cec4 and c8843ca2e25c8ce925da9cc254a175d1249d3992 reproduce the revised base (see comment 6).

The trunk version of cable_canopy contains a limiting condition for dq and dq_unsat, which was merged into the trunk when consolidating with the ACCESS-CM2 version of CABLE. The reason for why it is NOT in our development branch is unknown. It is of course possible that this was simply inadvertently deleted or perhaps there was some overlooked complication with merging that occurred as it tried to merge the same code from both sources. Nevertheless, these limiting conditions alter the outcome.

We plot versions against each other, where “trunk” represents the base and “new”represents where this condition has been applied.

ONLY change is as indicated in green, the rest are whitespace changes etc.

At this point the "base" for comparisson will be adjusted to consistent with 33b54ddae446a5faa180ad645a13d9877f73d67b.

penguian commented 3 years ago

@jxs599@nci.org.au changed _comment0 which not transferred by tractive

penguian commented 3 years ago

@jxs599@nci.org.au commented


The trunk is currently ar 516aafe393e159a1fe2d36044f8f3d687d40687a:

Our branch that restructures CABLE incrementally, highlighting differences is here:

Incremental steps in bringing the trunk into line with CABLE-3.0 show that revisions between 33b54ddae446a5faa180ad645a13d9877f73d67b and 6d357a3a5a391c159de43de01e0f4e39d5182428 reproduce the revised base (see comment 7).

Until 6d357a3a5a391c159de43de01e0f4e39d5182428, cbl_radiation has been calculating the sunlit_veg_mask internally, a situation which has arisen due to the mask being recalculated several times RATHER than done once and consistently - the mask then distributed. One change that we know is coming is due to CABLE-3.0 adopting a new definition of “sunlit” which can be determined by the zenith angle of the sun. We can show the impact of this change by moving to a single centralised calculATION OF THE MASK. This has been done for all other instances however that in cbl_radiation remained. Unfortunately it was using the radiation threshold (rad_thresh) as the condition which had to be met. However, single centralised calculATION OF THE MASK is using the coszen threshold (coszen_tols) as the condition. This leads to a difference occurring at this juncture of moving to use the correct mask (ca8b17d0d377105ca9031240ac34c19fdd2628f1).

We plot versions against each other, where “trunk” represents the base and “new”represents where this condition has been applied.

ONLY change is as indicated in green, the rest are white space changes etc.

At this point the "base" for comparisson will be adjusted to be ca8b17d0d377105ca9031240ac34c19fdd2628f1.

penguian commented 3 years ago

@jxs599@nci.org.au changed _comment0 which not transferred by tractive

penguian commented 3 years ago

@jxs599@nci.org.au commented


Similar to previous comment we move init_radiation to use pre-computed masks. using sunlit_veg_mask imposes different tolerance on coszen condition which doesnt lead to any change at modest latitudes of Ampero and Tumba but IT DOES at high lat of Finland, Hytiala. We did use the pre-computed mask and so:

At this point the "base" for comparisson will be adjusted to be ffd10d22c16870c35c649b52b3f82b89313848ae.

However, at the end we reverted back to using the smaller tolerance of 1.e-6.

penguian commented 3 years ago

@jxs599@nci.org.au changed _comment0 which not transferred by tractive

penguian commented 3 years ago

@jxs599@nci.org.au changed _comment1 which not transferred by tractive

penguian commented 3 years ago

@jxs599@nci.org.au commented


One final (non-)change was noted prior to fully syncing ^/test_jxs599 into this branch@7623

However, at the end we reverted back to using the smaller tolerance of 1.e-6 to determine the value of the beam extinction coefficient when daylight @ e654d9f0b4fa708195bea0a38baa1baadb67bc0c.

penguian commented 3 years ago

@jxs599@nci.org.au changed _comment0 which not transferred by tractive

penguian commented 3 years ago

@jxs599@nci.org.au commented


Prior to updating to JULES5.8 We compare HAC@Loobos TO CABLE3@Loobos. It may indeed require fudging:

e.g.

  1. We have faced the LAI and HGT via JULES namelists to be equivalent to those used in CABLE.

  2. We have implemented CABLE's calc of zenith angle to be used when LSM_ID='cable'

  3. etc

Immediately we note a difference in the fluxes. Returning to a comparison albedo reveals an old problem that the output albedos in both models are actually different things. So we output the fields directly from _albedo().

  1. We find that the diffuse component of the extinction coefficient is different.

We have made modifications to use the same mpif90 compiler as is suggested by the fcm_make.log

So back tracking quite a bit we have found that the LAI and canopy height are not consistent across both versions.

I have added LAI to the met forcing file and made it equal to those in the Loobos pft_params.nml that is used for jules.

I have added a patch dim to the met forcing file for canopy height and made it equal to those cable_pft_params, and the adjusted the Loobos pft_params.nml that is used for jules. It may be better to USE the JULES version of height, but this is consistent at least

The albedo, effective surface reflectance [AlbEffSurfRefl] is still slightly different between models at dawn and dusk ??

Due to snow albedo being different, which n turn is due to soil albedo differing.

JULES gets soil albedo through ancillaries.nml. God knows where CABLE is actually getting it, BUT I can add to the Loobos1997.nml met file that CBLE is using - and voila - surface albedo matches now.

penguian commented 3 years ago

@jxs599@nci.org.au changed _comment0 which not transferred by tractive

penguian commented 3 years ago

@jxs599@nci.org.au changed _comment1 which not transferred by tractive

penguian commented 3 years ago

@jxs599@nci.org.au commented


comment:11 improved aggreement between fluxes remarkably, however I suspect better agreement can be sought

penguian commented 3 years ago

@jxs599@nci.org.au changed _comment0 which not transferred by tractive

penguian commented 3 years ago

@jxs599@nci.org.au commented


Questions have arisen about the impact of this Restructured CABLE-3.0 on the MPI application. The build process did require some modification AND some improvements to the build process across applications have been made in one of the working copies where each application calls a common Makefile. Other modifications were necessary, resulting in the final version of CABLE-3.0 that supported MPI@GSWP, this was called CABLE-3.0+MPI. However these will be reported on later.

Running MPI at single sites doesn't reveal anything unusual, indicating that these sites don't trigger anything unusual.
Running MPI globally using GSWP2 forcing, and comparing CABLE-3.0+MPI against the trunk revealed several differences. Looking at the fundamental daily mean screen temperature suggested a relative difference at several outliers of uptown 80%. Closer investigation of this worrying peculiarity showed that this particular site was very close to 0 degrees celsius in the trunk version. Converting the scripts to conduct the analysis in Kelvin removed these outliers.

Next it was found that extreme variations remaining were at several sites. The largest outlier was investigated further, being in Western China it was expected that we might find a lake there where we know the code was changed. However, this was not the case. It was found that the soil type here was isoilm=9, indicating a permanent ice tile. CABLE-3.0 moved from using isoilm=9 as a trigger to using the surface_type, or the PFT type, in several circumstances. At this site and several others (here was Tundra), moving the trigger meant segments of code were not executed, resulting in differences in output.

Following this change it was still found that there were SOME differences from the trunk in fundamental fields that were to significant to ignore. So in the interest of identifying the source of these differences we are going to repeat what we have already done for a range of single sites using the serial code.

Beginning at a comparison analogous to that in comment:6.

penguian commented 3 years ago

@jxs599@nci.org.au commented


Following comment:11 it has become apparent that this is way too difficult to sort out less our vn control tool, therefore this work corresponds to: this code.

penguian commented 3 years ago

@jxs599@nci.org.au commented


This comment will be embellished later. I have discovered that difference in albedo field, whilst irrelevant to the model, is due to initialisation/scope of rhoch variables in the offline model - fix!!

penguian commented 3 years ago

@jxs599@nci.org.au commented


Re-branched, re-again:

CABLE-3.0forMPI_Xmas

merging from incrementally

issue discussed in comment:6 (order of operations)

staddleing revisions 9a77aa84d58faeabd5471437421bce292ef72177 and d7ac6c3d1db9cbe94c2324b2f8266a84e19be034. ReBased @ 7826.

penguian commented 3 years ago

@jxs599@nci.org.au changed _comment0 which not transferred by tractive

penguian commented 3 years ago

@jxs599@nci.org.au changed _comment1 which not transferred by tractive

penguian commented 3 years ago

@jxs599@nci.org.au changed _comment2 which not transferred by tractive

penguian commented 3 years ago

@jxs599@nci.org.au commented


Next we merged changes from branch used throughout so far in this Ticket upto 7582. We chose this revision because the 2nd difference in the serial, single site tests was between revisions 7582 and 7583. We expected therefore to see a similar thing here. However, significant differences were noted even at analogous revision to 7582. One such difference was that discussed where PFT type was used to trigger code that previously was triggered by soil type. This changes behaviour of the model for Tundra regions which no longer meet this trigger.

'''NB: An important thing to note here is that neglecting necessary mods to building scripts to accomodate MPI CABLE, differences so far are due to the increased heterogeneity of global runs recognised by the science processes and NOT the MPI code itself.

Remaining difference pre commit of working vn @ a612bdff9d1a8b1f330b230315348a759f3413a7 was due to zero initialisation of incoming variables to albedo() which were being returned to the counterparts in the arg list whence they were called - this is they fed back directly to the cable* % type vars.

^^Share/CABLE3.0/ReStructured4JAC.4@7582 was EXPORTED over the top of a612bdff9d1a8b1f330b230315348a759f3413a7 and appropriately 7837 was brought as close as possible to this vn before moving on with merging.

penguian commented 3 years ago

@jxs599@nci.org.au changed _comment0 which not transferred by tractive

penguian commented 3 years ago

@jxs599@nci.org.au changed _comment1 which not transferred by tractive

penguian commented 3 years ago

@jxs599@nci.org.au commented


svn merge Share/CABLE3.0/ReStructured4JAC.4 -r 7582:7583 will pick up the limiting of dq saturation in canopy(analogous to comment:8). This is expected to change results slightly and so we will move the baseline for consideration to reflect this.

penguian commented 3 years ago

@jxs599@nci.org.au commented


As noted above the expected difference due to including the canopy saturation limitation is there, however there are a Buch of differences noted before this.

Ideally a comparison between 7838 w high is the analogous revision of 7582 (i.e. the revision immediately prior tp the implementation of the canopy saturation limitation) would highlight the issues which lead to the differences occurring in addition to this implementation. However, in the process of eliminating potential causes for this we have wound back many of the modifications made in the single-site, serial testing done previously.

An example of one such difference that shows up in the difference between the MPI branch vs the serial brand tested is the location of the hard-wired parameter files used in offline applications until a nameless version is implemented. The fees are indeed needed in:

offline/cable_parameters.F90:  USE cable_pft_params_mod  AND
util/cable_common.F90

The differences in cooled/ re irrelevant for now. Genrally the differences in offline can also be ignored, although the potentially relevant differences are noted here:

cable_cbm.F90

The differences in the science/ directory are relevant BUT may also be exaggerate the number of issues that needed to be reversed in the MPI testing in order to maintain consistency. As noted above we may very well have over-compensated here. We will attempt to address this in the revisions following 7539 which includes the "dq" change, and at which point the baseline for further comparrisson is reset. Other deviations from _canopy@7582 are trivial whitespace changes.

_radiation changeNOT included from 7582 is that therein they use a redefined version of the sunlit/veg mask. revert as 7583 version is superior.

Similarly roughness/ version revert as 7583 version are superior.

albedo/* 7582 we will revise later

revert diagnostic files diffs as diagnostic files are irrelevant in the wider scheme of things anyway.

So from d7ac6c3d1db9cbe94c2324b2f8266a84e19be034 to 0827e19761a7027480518463b359fc0434c1f10e there is no change.

Between revisions 1df38d563470a8a435ff46608cfc618c4ec12ee4 and 1df38d563470a8a435ff46608cfc618c4ec12ee4, there are these changes noted [ here] only in the Albedo and Tscrn fields

penguian commented 3 years ago

@jxs599@nci.org.au changed _comment0 which not transferred by tractive

penguian commented 3 years ago

@jxs599@nci.org.au changed _comment1 which not transferred by tractive

penguian commented 3 years ago

@jxs599@nci.org.au changed _comment2 which not transferred by tractive

penguian commented 3 years ago

@jxs599@nci.org.au changed _comment3 which not transferred by tractive

penguian commented 3 years ago

@jxs599@nci.org.au commented


8532bcfc4a2710ee4240259f2fb75888b47433fa will then include the improved vn of _parameters.F90, cable_common, and moved params files. this checks out.

as does 21562a4242ca8b315dfa6e2648cd827dfe40201c using args passed as vars in snow_Albedo and incorporating exported versions

next we merge 7583:7589 - up to next legitimate change in serial version

There has been some back&forth between 8532bcfc4a2710ee4240259f2fb75888b47433fa and 7ea1e7e35ae848b492e8a1442a0beeaa8ed8231a to maintain consistency, however at this point (7ea1e7e35ae848b492e8a1442a0beeaa8ed8231a) There is a a single loop in albedo() where the test for supposedly vegetated or NOT uses 1e^-2 instead of 1e-3 (==LAI_thresh). in 7ea1e7e35ae848b492e8a1442a0beeaa8ed8231a we construct a dummy mask to cater for this, however in the next revision ([7846]) we will adopt the proper LAI_thresh where we are going to reBase@7846.

The impact of this mod defining cells which are vegetated is shown [ here].

Up until 32f724fa65583b3ac59900d57c2daff49bd33879 output results remained consistent. At 19ae8d457b3f9f943fe1f1be7da14f5fe0a9f38d we implemented change to mask triggering off zenith angle to define sunlit regions. This yielded results which showed a few cells only differing by (in some fields) a relatively large amount.

We expect to see differences due to the new mask condition, where sections of code are/are NOT executed because the condition is met/NOT met. Perhaps even in distinct cells. A possible scenario for this is where for some reason SW forcing in a distinct cell is anomalous to that in the cells adjacent.

It is expected that the Zenith angle is fairly smooth spatiallly. To clarify, the ONLY difference between the two versions [32f724fa65583b3ac59900d57c2daff49bd33879 and 19ae8d457b3f9f943fe1f1be7da14f5fe0a9f38d] is the adoption of the new zenith dependent mask in place of a downwards SW dependent mask. As the zenith angle should be spatially smooth,
it might be expected that large differences observed in the output fields are correlated with "spikes" in the downwards SW.

This is proving troublesome to chase.

Arrgggghh - I just lost a bunch of analysis. To summarize very briefly - remember these are daily means so might not capture the story adequately. The Soil Temp diff (using the Zenith mask as well) does Seem to have some correlation with the SW, and indeed the zenith mask would presumably give rise to the smoothness observed. However, these are daily means AND the Soil Temperature will lag the SW somewhat anyway, AND it is odd that the SW might be "clouded" to such an extent as to not trigger as daylit.

The Soil Moisture diff (using the Zenith mask as well) seems to have some correlation pattern with the SW although more scattered. Perhaps due to the the Soil Moisture lagging the SW even more. Isolated - however 8-12.5% relative diffs here seems excessive.

penguian commented 3 years ago

@jxs599@nci.org.au changed _comment0 which not transferred by tractive

penguian commented 3 years ago

@jxs599@nci.org.au changed _comment1 which not transferred by tractive

penguian commented 3 years ago

@jxs599@nci.org.au changed _comment2 which not transferred by tractive

penguian commented 3 years ago

@jxs599@nci.org.au changed _comment3 which not transferred by tractive

penguian commented 3 years ago

@jxs599@nci.org.au changed _comment4 which not transferred by tractive