Closed jkshuman closed 6 years ago
@ekluzek @rosiealice @rgknox @ckoven I am getting a balance check error in the fire runs. This is using the latest fates version which incorporates the memory leak fix, and was merged with an added history variable from my branch. The error that is writing out is within the CLM BalanceCheckMod.f90. The system is down, and I can't get more information at the moment. When I was looking at it last night, I submitted the run with a switch from nyears to nmonths. As I was watching the file list in the case/run folder the cesm.log would pop up and then disappear. I was not able to see if it finally appeared last night. I haven't seen that behavior before (inability to write the cesm.log.). I did cancel the run and restart, and it was the same behavior where the cesm log would appear and disappear. I will try resubmitting with stop_option set to ndays - maybe it isn't completing the month? Any advice/help would be appreciated on what to look for, etc.
Erik - does this look at all similar to the balance check error we saw in the past?
Some things I'm noticing:
The radiation solution errors are quite large, so if they are that large, I would not be surprised if they will generate a NaN, or cause anarchy anywhere in the code down-stream.
These errors appear to be triggered over and over again in the same patch. The patch area is e-11 in size, which seems like maybe it should be culled?
In the arrays that are printed out, lai_change, elai, ftweight, etc. I'm surprised that there are some lai_change values (which is change in light level, per change in lai, maybe..) where I see no tai. But its hard to tell why this is so.
I'm wondering if perhaps the "ftweight" variable is being filled incorrectly, and maybe because there is something special about the grasses. I can't really tell exactly what is happening though, also the diagnostic that writes this stuff uses canopy layer 1 for ftweight, but ncl_p for the others...
Do these runs have grasses with some structural biomass, or are they 0 structure/sap?
allom_latosa_int = zero. but had a variant with allom_agb1=zero and allom_agb1=0.0001 (both variants failed.)
will try a variant with allom_latosa_int set to default and allom_agb1=0.0001
Jacquelyn Shuman, PhD Terrestrial Sciences Section National Center for Atmospheric Research PO Box 3000 Boulder, Colorado 80307-3000 USA
jkshuman@ucar.edu office: +1-303-497-1787
On Sun, May 6, 2018 at 9:45 PM, Ryan Knox notifications@github.com wrote:
Some things I'm noticing: The radiation solution errors are quite large, so if they are that large, I would not be surprised if they will generate a NaN, or cause anarchy anywhere in the code down-stream. These errors appear to be triggered over and over again in the same patch. The patch area is e-11 in size, which seems like maybe it should be culled? In the arrays that are printed out, lai_change, elai, ftweight, etc. I'm surprised that there are some lai_change values (which is change in light level, per change in lai, maybe..) where I see no tai. But its hard to tell why this is so. I'm wondering if perhaps the "ftweight" variable is being filled incorrectly, and maybe because there is something special about the grasses. I can't really tell exactly what is happening though, also the diagnostic that writes this stuff uses canopy layer 1 for ftweight, but ncl_p for the others...
Do these runs have grasses with some structural biomass, or are they 0 structure/sap?
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/NGEET/fates/issues/378#issuecomment-386949902, or mute the thread https://github.com/notifications/unsubscribe-auth/AVFDhvYsfd6mw0wkl7_aK_Bw9SKixv1zks5tv8NygaJpZM4Tzp8E .
Run which uses allom_latosa_int = default and allom_agb1=0.0001 for grass also fails in year 5 with fire. (This is a bad case name as it uses default allometry. will fix that...) /glade2/scratch2/jkshuman/Fire0507_Obrienh_Saldaa_Saldal_latosa_int_default_2PFT_1x1_2dba074_f8d7693/run Similar failure message in year 5. In cesm log there is a set of "NetCDF: invalid dimension ID or name statements" followed by patch trimming followed by Solar radiation balance check errors, more patch trimming, more radiation balance check errors. Then again identifying CLM BalanceCheckMod line 543.
WARNING:: BalanceCheck, solar radiation balance error (W/m2)
334: nstep = 96938
334: errsol = -1.311063329012541E-007
330: WARNING:: BalanceCheck, solar radiation balance error (W/m2)
330: nstep = 96938
330: errsol = -1.427682150278997E-007
529:Image PC Routine Line Source
529:cesm.exe 0000000001237DAD Unknown Unknown Unknown
529:cesm.exe 0000000000D1B432 shr_abort_modmp 114 shr_abort_mod.F90
529:cesm.exe 0000000000503CD5 abortutils_mp_end 77 abortutils.F90
529:cesm.exe 0000000000677E2D balancecheckmod_m 543 BalanceCheckMod.F90
529:cesm.exe 000000000050AF77 clm_driver_mp_clm 924 clm_driver.F90
529:cesm.exe 00000000004F9516 lnd_comp_mct_mp_l 451 lnd_comp_mct.F90
529:cesm.exe 0000000000430E14 component_modmp 688 component_mod.F90
529:cesm.exe 0000000000417D59 cime_comp_modmp 2652 cime_comp_mod.F90
529:cesm.exe 0000000000430B3D MAIN__ 68 cime_driver.F90
529:cesm.exe 0000000000415C5E Unknown Unknown Unknown
529:libc-2.19.so 00002AAAB190AB25 __libc_start_main Unknown Unknown
529:cesm.exe 0000000000415B69 Unknown Unknown Unknown
that is the right case name. Obrien Salda is the default allometry... too many iterations on this.
@rgknox @rosiealice I did another set of runs for single and 2PFTs for a regional run in South America. Both fails have the same set of solar radiation balance check errors. I include pieces of the cesm.log for the failed runs.
general case statement: ./create_newcase --case ${casedir}${CASE_NAME} --res f09_f09 --compset 2000_DATM%GSWP3v1_CLM45%FATES_SICE_SOCN_RTM_SGLC_SWAV --run-unsupported
1 PFT (no fire) for Grass and Trop Tree completed to year 21 with reasonable biomass and distribution. 1 PFT (Fire) for Trop Tree completed through year 21. 1 PFT (Fire) for Grass failed at year 11. (cesm.log piece below)
2 PFT (Fire) for Trop Tree and Grass failed at year 5. (cesm.log piece after the fire grass log)
/glade2/scratch2/jkshuman/Fire_Grass_1x1_2dba074_f8d7693/run Errors: clmfates_interfaceMod.F90:: reading froz_q10 217: NetCDF: Invalid dimension ID or name 217: NetCDF: Invalid dimension ID or name 217: NetCDF: Invalid dimension ID or name 217: NetCDF: Invalid dimension ID or name 217: NetCDF: Invalid dimension ID or name 217: NetCDF: Variable not found 217: NetCDF: Variable not found 0:(seq_domain_areafactinit) : min/max mdl2drv 1.00000000000000 1.00000000000000 areafact_a_ATM 0:(seq_domain_areafactinit) : min/max drv2mdl 1.00000000000000 1.00000000000000 areafact_a_ATM 102: trimming patch area - is too big 1.818989403545856E-012 109: trimming patch area - is too big 1.818989403545856E-012 467: WARNING:: BalanceCheck, solar radiation balance error (W/m2) 467: nstep = 192742 467: errsol = -1.090609771381423E-007
(and from further within the cesm.log...)
WARNING:: BalanceCheck, solar radiation balance error (W/m2)
202: nstep = 195723
202: errsol = -1.013256678561447E-007
180:Image PC Routine Line Source
180:cesm.exe 0000000001237DAD Unknown Unknown Unknown
180:cesm.exe 0000000000D1B432 shr_abort_modmp 114 shr_abort_mod.F90
180:cesm.exe 0000000000503D97 abortutils_mp_end 43 abortutils.F90
180:cesm.exe 000000000050329C lnd_import_export 419 lnd_import_export.F90
180:cesm.exe 00000000004F9557 lnd_comp_mct_mp_l 457 lnd_comp_mct.F90
180:cesm.exe 0000000000430E14 component_modmp 688 component_mod.F90
180:cesm.exe 0000000000417D59 cime_comp_modmp 2652 cime_comp_mod.F90
180:cesm.exe 0000000000430B3D MAIN__ 68 cime_driver.F90
180:cesm.exe 0000000000415C5E Unknown Unknown Unknown
180:libc-2.19.so 00002AAAB190AB25 __libc_start_main Unknown Unknown
180:cesm.exe 0000000000415B69 Unknown Unknown Unknown
180:MPT ERROR: Rank 180(g:180) is aborting with error code 1001.
180: Process ID: 70276, Host: r2i2n9, Program: /glade2/scratch2/jkshuman/Fire_Grass_1x1_2dba074_f8d7693/bld/cesm.exe
180: MPT Version: SGI MPT 2.15 12/18/16 02:58:06
/glade2/scratch2/jkshuman/Fire0507_Obrienh_Saldaa_Saldal_2PFT_1x1_2dba074_f8d7693/run
WARNING:: BalanceCheck, solar radiation balance error (W/m2)
330: nstep = 96938
330: errsol = -1.427682150278997E-007
529:Image PC Routine Line Source
529:cesm.exe 0000000001237DAD Unknown Unknown Unknown
529:cesm.exe 0000000000D1B432 shr_abort_modmp 114 shr_abort_mod.F90
529:cesm.exe 0000000000503CD5 abortutils_mp_end 77 abortutils.F90
529:cesm.exe 0000000000677E2D balancecheckmod_m 543 BalanceCheckMod.F90
529:cesm.exe 000000000050AF77 clm_driver_mp_clm 924 clm_driver.F90
529:cesm.exe 00000000004F9516 lnd_comp_mct_mp_l 451 lnd_comp_mct.F90
529:cesm.exe 0000000000430E14 component_modmp 688 component_mod.F90
529:cesm.exe 0000000000417D59 cime_comp_modmp 2652 cime_comp_mod.F90
529:cesm.exe 0000000000430B3D MAIN__ 68 cime_driver.F90
529:cesm.exe 0000000000415C5E Unknown Unknown Unknown
529:libc-2.19.so 00002AAAB190AB25 __libc_start_main Unknown Unknown
529:cesm.exe 0000000000415B69 Unknown Unknown Unknown
529:MPT ERROR: Rank 529(g:529) is aborting with error code 1001.
529: Process ID: 47973, Host: r5i4n34, Program: /glade2/scratch2/jkshuman/Fire0507_Obrienh_Saldaa_Saldal_2PFT_1x1_2dba074_f8d7693/bld/cesm.exe
529: MPT Version: SGI MPT 2.15 12/18/16 02:58:06
529:
529:MPT: --------stack traceback-------
0: memory_write: model date = 60715 0 memory = 129228.42 MB (highwater) 102.11 MB (usage) (pe= 0 comps= ATM ESP)
529:MPT: Attaching to program: /proc/47973/exe, process 47973
529:MPT: done.
529: gridcell longitude = 290.000000000000
529: gridcell latitude = -15.5497382198953
@jkshuman , can you provide a link to the branch you are using, I can't find hash f8d7693
It is a merge between the memory leak commit and my added crown area history field. Here is a link, but this may not have the memory leak commit. I don't recall if I pushed those changes to my link. Cheyenne is still down. so I can't update at the moment. https://github.com/jkshuman/fates/tree/hio_crownarea_si_pft_sync
Cheyenne is still down, so putting my link to my crown area history variable branch in this issue as well. The failing runs were on a merge branch created from master branch #372 memory leak fix and my crown area branch (link below). https://github.com/jkshuman/fates/tree/hio_crownarea_si_pft
I updated the sync branch with the failing branch code. https://github.com/jkshuman/fates/tree/hio_crownarea_si_pft_sync
Did you try the run with just the new master branch? That way we can see if the issues are caused by stuff on the branch?
2018-05-11 13:04 GMT-06:00 jkshuman notifications@github.com:
I updated the sync branch with the failing branch code. https://github.com/jkshuman/fates/tree/hio_crownarea_si_pft_sync
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/NGEET/fates/issues/378#issuecomment-388457260, or mute the thread https://github.com/notifications/unsubscribe-auth/AMWsQz3aPQrwTkDSmE-ksRjhY2pcpMLkks5txeDUgaJpZM4Tzp8E .
Dr Rosie A. Fisher
Staff Scientist Terrestrial Sciences Section Climate and Global Dynamics National Center for Atmospheric Research 1850 Table Mesa Drive Boulder, Colorado, 80305 USA. +1 303-497-1706
Running 1PFT grass, 1PFT trop tree, and 2PFT all with fire on CLM4.5 (paths below) New set of runs being created with this branch (crown area history merge with 379 canopy photo fix): https://github.com/jkshuman/fates/tree/hio_crownarea_si_pft_379canopy_photo_fix
./create_newcase --case ${casedir}${CASE_NAME} --res f09_f09 --compset 2000_DATM%GSWP3v1_CLM45%FATES_SICE_SOCN_RTM_SGLC_SWAV --run-unsupported
/glade2/scratch2/jkshuman/Fire_Grass_1x1_2dba074_5dda57b /glade2/scratch2/jkshuman/Fire_Obrien_Salda_TropTree_1x1_2dba074_5dda57b /glade2/scratch2/jkshuman/Fire_Obrienh_Saldaa_Saldal_2PFT_1x1_2dba074_5dda57b
the crown area stuff is just a history variable, so unlikely to cause this failure? but can run with master to test that as well.
Jacquelyn Shuman, PhD Terrestrial Sciences Section National Center for Atmospheric Research PO Box 3000 Boulder, Colorado 80307-3000 USA
jkshuman@ucar.edu office: +1-303-497-1787
On Fri, May 11, 2018 at 1:10 PM, Rosie Fisher notifications@github.com wrote:
Did you try the run with just the new master branch? That way we can see if the issues are caused by stuff on the branch?
2018-05-11 13:04 GMT-06:00 jkshuman notifications@github.com:
I updated the sync branch with the failing branch code. https://github.com/jkshuman/fates/tree/hio_crownarea_si_pft_sync
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/NGEET/fates/issues/378#issuecomment-388457260, or mute the thread https://github.com/notifications/unsubscribe-auth/AMWsQz3aPQrwTkDSmE- ksRjhY2pcpMLkks5txeDUgaJpZM4Tzp8E .
--
Dr Rosie A. Fisher
Staff Scientist Terrestrial Sciences Section Climate and Global Dynamics National Center for Atmospheric Research 1850 Table Mesa Drive https://maps.google.com/?q=1850+Table+Mesa+Drive+%0D%0ABoulder,+Colorado,+80305+%0D%0AUSA&entry=gmail&source=g Boulder, Colorado, 80305 USA. +1 303-497-1706
http://www.cgd.ucar.edu/staff/rfisher/
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/NGEET/fates/issues/378#issuecomment-388458750, or mute the thread https://github.com/notifications/unsubscribe-auth/AVFDhpLxiKBTjX8wIrRZQnV3n-Rx6ZRbks5txeI9gaJpZM4Tzp8E .
looks like my single site run at:
gridcell longitude = 290.000000000000 gridcell latitude = -15.5497382198953
did not generate the error after 30 years.
I will try to look through and see if I added some configuration that was different.
Run directory:
/glade2/scratch2/rgknox/jkstest-1pt-v0/run
Uses this parameter file:
/glade/u/home/rgknox/param_file_2PFT_Obrienh_Saldaa_Saldal_05042018.nc
this was with fire for clm45?
I noticed this in the parameter file:
fates_leaf_xl = 0.1, 0.1, -0.3
This may be fine, it just caught my eye. xl is orientation index, which I think I recall allowing negatives. But we should double check if our formulation does.
yeah, that parameter seems fine, false alarm
my runs are a 1 degree regional subset for South America. surface and domain files here:
/glade2/scratch2/jkshuman/sfcdata
Jacquelyn Shuman, PhD Terrestrial Sciences Section National Center for Atmospheric Research PO Box 3000 Boulder, Colorado 80307-3000 USA
jkshuman@ucar.edu office: +1-303-497-1787
On Fri, May 11, 2018 at 1:57 PM, Ryan Knox notifications@github.com wrote:
yeah, that parameter seems fine, false alarm
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/NGEET/fates/issues/378#issuecomment-388470090, or mute the thread https://github.com/notifications/unsubscribe-auth/AVFDhnXNHevVwvGv620Rs7kIJwoRe0liks5txe0rgaJpZM4Tzp8E .
ok, thanks. New single site run on cheyenne is going, now using spit-fire.
My current guess as to what is happening is that we are running into a problem with nigh-zero biomass or leaves, which is the product of fire turning over an all grass patch? Its possible the recent bug fix addressed this, but we will see.
@rgknox another set of runs going with pull request 382. 1 PFT runs with fire are still going (tree at year 21, grass at year 2 - slow in queue?). 2PFT run (trop tree and grass) failed in year 6. Similar set of errors. BalanceCheckMod.f90 line 543, BalanceCheck, solar radiation balance error.
/glade/scratch/jkshuman/archive/Fire_Obrienh_Saldaa_Saldal_2PFT_SA1x1_2dba074_0f0c41c/
New location:
gridcell longitude = 305.000000000000
gridcell latitude = -23.0890052356021
From cesm.log
WARNING:: BalanceCheck, solar radiation balance error (W/m2)
235: nstep = 119564
235: errsol = -1.108547849071329E-007
252: WARNING:: BalanceCheck, solar radiation balance error (W/m2)
252: nstep = 119565
252: errsol = -1.065200194716454E-007
0: memory_write: model date = 71029 0 memory = 128919.57 MB (highwater) 101.85 MB (usage) (pe= 0 comps= ATM ESP)
467: trimming patch area - is too big 1.818989403545856E-012
545: trimming patch area - is too big 1.818989403545856E-012
353: trimming patch area - is too big 1.818989403545856E-012
390: trimming patch area - is too big 1.818989403545856E-012
513: trimming patch area - is too big 1.818989403545856E-012
506: trimming patch area - is too big 1.818989403545856E-012
535: trimming patch area - is too big 1.818989403545856E-012
446: trimming patch area - is too big 1.818989403545856E-012
469: trimming patch area - is too big 1.818989403545856E-012
477: trimming patch area - is too big 1.818989403545856E-012
326: trimming patch area - is too big 1.818989403545856E-012
403: trimming patch area - is too big 1.818989403545856E-012
69: trimming patch area - is too big 1.818989403545856E-012
239: trimming patch area - is too big 1.818989403545856E-012
70: trimming patch area - is too big 1.818989403545856E-012
218: trimming patch area - is too big 1.818989403545856E-012
257: trimming patch area - is too big 1.818989403545856E-012
75: trimming patch area - is too big 1.818989403545856E-012
330: trimming patch area - is too big 1.818989403545856E-012
170: trimming patch area - is too big 1.818989403545856E-012
200: trimming patch area - is too big 1.818989403545856E-012
198: trimming patch area - is too big 1.818989403545856E-012
255: trimming patch area - is too big 1.818989403545856E-012
80: trimming patch area - is too big 1.818989403545856E-012
219: trimming patch area - is too big 1.818989403545856E-012
118: trimming patch area - is too big 1.818989403545856E-012
119: trimming patch area - is too big 1.818989403545856E-012
202: >5% Dif Radn consvn error -1.05825538715178 1 2
202: diags 7.96359955072742 -54.6696896639910 38.3301532002546
202: lai_change 0.000000000000000E+000 0.000000000000000E+000
202: 0.000000000000000E+000 0.000000000000000E+000 0.000000000000000E+000
202: 0.000000000000000E+000 0.000000000000000E+000 0.000000000000000E+000
202: 0.000000000000000E+000 0.000000000000000E+000 0.000000000000000E+000
202: 0.000000000000000E+000
202: elai 0.796415587611356 0.000000000000000E+000 0.961509001506293
202: 0.000000000000000E+000 0.000000000000000E+000 0.961509001506293
202: 0.000000000000000E+000 0.000000000000000E+000 0.234465085324267
202: 0.000000000000000E+000 0.000000000000000E+000 0.000000000000000E+000
202: esai 9.096157657329497E-002 0.000000000000000E+000 3.849099849370675E-002
202: 0.000000000000000E+000 0.000000000000000E+000 3.849099849370675E-002
202: 0.000000000000000E+000 0.000000000000000E+000 9.398288976575598E-003
202: 0.000000000000000E+000 0.000000000000000E+000 0.000000000000000E+000
202: ftweight 1.267302001703947E-002 0.000000000000000E+000
202: 29.1624152220974 0.000000000000000E+000 0.000000000000000E+000
202: 29.1624152220974 0.000000000000000E+000 0.000000000000000E+000
202: 29.1624152220974 0.000000000000000E+000 0.000000000000000E+000
202: 0.000000000000000E+000
202: cp 6.405767903805394E-010 1
202: bc_in(s)%albgr_dif_rb(ib) 0.190858817093915
202: rhol 0.100000001490116 0.100000001490116 0.100000001490116
202: 0.449999988079071 0.449999988079071 0.349999994039536
202: ftw 1.00000000000000 1.00000000000000 0.000000000000000E+000
202: 0.000000000000000E+000
202: present 1 0 0
202: CAP 1.00000000000000 0.000000000000000E+000 0.000000000000000E+000
331: Large Dir Radn consvn error 87300236774.1395 1 2
331: diags 35545013833.8197 -1.718567028306606E-002 -793747809365.306
331: 496278040697.993
331: lai_change 0.000000000000000E+000 0.000000000000000E+000
331: 0.000000000000000E+000 0.000000000000000E+000 0.000000000000000E+000
331: 0.000000000000000E+000 0.000000000000000E+000 0.000000000000000E+000
331: 0.000000000000000E+000 0.000000000000000E+000 0.000000000000000E+000
331: 0.000000000000000E+000
331: elai 0.776682425289442 0.000000000000000E+000 0.961569569355599
331: 0.000000000000000E+000 0.000000000000000E+000 0.961569569355599
331: 0.000000000000000E+000 0.000000000000000E+000 0.227539226615268
331: 0.000000000000000E+000 0.000000000000000E+000 0.000000000000000E+000
331: esai 9.093202219977818E-002 0.000000000000000E+000 3.843043064440077E-002
331: 0.000000000000000E+000 0.000000000000000E+000 3.843043064440077E-002
331: 0.000000000000000E+000 0.000000000000000E+000 9.101385150350671E-003
331: 0.000000000000000E+000 0.000000000000000E+000 0.000000000000000E+000
331: ftweight 0.143517787345916 0.000000000000000E+000
331: 0.856482212654084 0.000000000000000E+000 0.000000000000000E+000
331: 0.856482212654084 0.000000000000000E+000 0.000000000000000E+000
331: 0.856482212654084 0.000000000000000E+000 0.000000000000000E+000
331: 0.000000000000000E+000
331: cp 2.006325586387992E-009 1
331: bc_in(s)%albgr_dir_rb(ib) 0.220000000000000
331: dif ground absorption error 1 1 -2.968510966153521E+017
331: -2.968510966153521E+017 2 2 1.00000000000000
331: >5% Dif Radn consvn error 4.270016056591235E+016 1 2
331: diags 1.669646990961853E+016 -3.805783289940412E+017 2.374544661398212E+017
331: lai_change 0.000000000000000E+000 0.000000000000000E+000
331: 0.000000000000000E+000 0.000000000000000E+000 0.000000000000000E+000
331: 0.000000000000000E+000 0.000000000000000E+000 0.000000000000000E+000
331: 0.000000000000000E+000 0.000000000000000E+000 0.000000000000000E+000
331: 0.000000000000000E+000
331: elai 0.776682425289442 0.000000000000000E+000 0.961569569355599
331: 0.000000000000000E+000 0.000000000000000E+000 0.961569569355599
331: 0.000000000000000E+000 0.000000000000000E+000 0.227539226615268
331: 0.000000000000000E+000 0.000000000000000E+000 0.000000000000000E+000
331: esai 9.093202219977818E-002 0.000000000000000E+000 3.843043064440077E-002
331: 0.000000000000000E+000 0.000000000000000E+000 3.843043064440077E-002
331: 0.000000000000000E+000 0.000000000000000E+000 9.101385150350671E-003
331: 0.000000000000000E+000 0.000000000000000E+000 0.000000000000000E+000
331: ftweight 7.801052745940848E-002 0.000000000000000E+000
331: 143.470563918829 0.000000000000000E+000 0.000000000000000E+000
331: 143.470563918829 0.000000000000000E+000 0.000000000000000E+000
331: 143.470563918829 0.000000000000000E+000 0.000000000000000E+000
331: 0.000000000000000E+000
331: cp 2.006325586387992E-009 1
331: bc_in(s)%albgr_dif_rb(ib) 0.220000000000000
331: rhol 0.100000001490116 0.100000001490116 0.100000001490116
331: 0.449999988079071 0.449999988079071 0.349999994039536
331: ftw 1.00000000000000 0.143517787345916 0.000000000000000E+000
331: 0.856482212654084
331: present 1 0 1
331: CAP 0.143517787345916 0.000000000000000E+000 0.856482212654084
331: there is still error after correction 1.00000000000000 1
331: 2
202: >5% Dif Radn consvn error -1.07307654594231 1 2
202: diags 8.03407121904317 -55.1147964199711 38.6409503555679
202: lai_change 0.000000000000000E+000 0.000000000000000E+000
202: 0.000000000000000E+000 0.000000000000000E+000 0.000000000000000E+000
202: 0.000000000000000E+000 0.000000000000000E+000 0.000000000000000E+000
202: 0.000000000000000E+000 0.000000000000000E+000 0.000000000000000E+000
202: 0.000000000000000E+000
202: elai 0.796415587611356 0.000000000000000E+000 0.961509001506293
202: 0.000000000000000E+000 0.000000000000000E+000 0.961509001506293
202: 0.000000000000000E+000 0.000000000000000E+000 0.234465085324267
202: 0.000000000000000E+000 0.000000000000000E+000 0.000000000000000E+000
202: esai 9.096157657329497E-002 0.000000000000000E+000 3.849099849370675E-002
202: 0.000000000000000E+000 0.000000000000000E+000 3.849099849370675E-002
202: 0.000000000000000E+000 0.000000000000000E+000 9.398288976575598E-003
202: 0.000000000000000E+000 0.000000000000000E+000 0.000000000000000E+000
202: ftweight 1.267302001703947E-002 0.000000000000000E+000
202: 29.1624152220974 0.000000000000000E+000 0.000000000000000E+000
202: 29.1624152220974 0.000000000000000E+000 0.000000000000000E+000
202: 29.1624152220974 0.000000000000000E+000 0.000000000000000E+000
202: 0.000000000000000E+000
202: cp 6.405767903805394E-010 1
202: bc_in(s)%albgr_dif_rb(ib) 0.190744628923151
202: rhol 0.100000001490116 0.100000001490116 0.100000001490116
202: 0.449999988079071 0.449999988079071 0.349999994039536
202: ftw 1.00000000000000 1.00000000000000 0.000000000000000E+000
202: 0.000000000000000E+000
202: present 1 0 0
202: CAP 1.00000000000000 0.000000000000000E+000 0.000000000000000E+000
331: energy balance in canopy 26844 , err= -11.9593662381158
331: WARNING:: BalanceCheck, solar radiation balance error (W/m2)
331: nstep = 119588
331: errsol = -1323.30638249407
331: clm model is stopping - error is greater than 1e-5 (W/m2)
331: fsa = -7.745702732785249E+017
331: fsr = 7.745702732785236E+017
331: forc_solad(1) = 5.51145480639649
331: forc_solad(2) = 8.61256572561393
331: forc_solai(1) = 16.1417364406403
331: forc_solai(2) = 13.0406255214228
331: forc_tot = 43.3063824940735
331: clm model is stopping
331: calling getglobalwrite with decomp_index= 26844 and clmlevel= pft
331: local patch index = 26844
331: global patch index = 9516
331: global column index = 4795
331: global landunit index = 1267
331: global gridcell index = 296
331: gridcell longitude = 305.000000000000
331: gridcell latitude = -23.0890052356021
331: pft type = 1
331: column type = 1
331: landunit type = 1
331: ENDRUN:
331: ERROR in BalanceCheckMod.F90 at line 543
331:
331:
I feel like ftweight should not ever be >1, but here it's like 93, 143, etc. I've got a bunch of slides to do for tomorrow morning still, but that's the thing that strikes me most about this. Maybe worth checking the ftweight calculations...
2018-05-14 21:28 GMT-06:00 jkshuman notifications@github.com:
@rgknox https://github.com/rgknox another set of runs going with pull request 382. 1 PFT runs with fire are still going (tree at year 21, grass at year 2 - slow in queue?). 2PFT run (trop tree and grass) failed in year
- Similar set of errors. BalanceCheckMod.f90 line 543, BalanceCheck, solar radiation balance error. /glade/scratch/jkshuman/archive/Fire_ObrienhSaldaa Saldal_2PFT_SA1x1_2dba074_0f0c41c/ New location: gridcell longitude = 305.000000000000 gridcell latitude = -23.0890052356021
From cesm.log WARNING:: BalanceCheck, solar radiation balance error (W/m2) 235: nstep = 119564 235: errsol = -1.108547849071329E-007 252: WARNING:: BalanceCheck, solar radiation balance error (W/m2) 252: nstep = 119565 252: errsol = -1.065200194716454E-007 0: memory_write: model date = 71029 0 memory = 128919.57 MB (highwater) 101.85 MB (usage) (pe= 0 comps= ATM ESP) 467: trimming patch area - is too big 1.818989403545856E-012 545: trimming patch area - is too big 1.818989403545856E-012 353: trimming patch area - is too big 1.818989403545856E-012 390: trimming patch area - is too big 1.818989403545856E-012 513: trimming patch area - is too big 1.818989403545856E-012 506: trimming patch area - is too big 1.818989403545856E-012 535: trimming patch area - is too big 1.818989403545856E-012 446: trimming patch area - is too big 1.818989403545856E-012 469: trimming patch area - is too big 1.818989403545856E-012 477: trimming patch area - is too big 1.818989403545856E-012 326: trimming patch area - is too big 1.818989403545856E-012 403: trimming patch area - is too big 1.818989403545856E-012 69: trimming patch area - is too big 1.818989403545856E-012 239: trimming patch area - is too big 1.818989403545856E-012 70: trimming patch area - is too big 1.818989403545856E-012 218: trimming patch area - is too big 1.818989403545856E-012 257: trimming patch area - is too big 1.818989403545856E-012 75: trimming patch area - is too big 1.818989403545856E-012 330: trimming patch area - is too big 1.818989403545856E-012 170: trimming patch area - is too big 1.818989403545856E-012 200: trimming patch area - is too big 1.818989403545856E-012 198: trimming patch area - is too big 1.818989403545856E-012 255: trimming patch area - is too big 1.818989403545856E-012 80: trimming patch area - is too big 1.818989403545856E-012 219: trimming patch area - is too big 1.818989403545856E-012 118: trimming patch area - is too big 1.818989403545856E-012 119: trimming patch area - is too big 1.818989403545856E-012 202: >5% Dif Radn consvn error -1.05825538715178 1 2 202: diags 7.96359955072742 -54.6696896639910 38.3301532002546 202: lai_change 0.000000000000000E+000 0.000000000000000E+000 202: 0.000000000000000E+000 0.000000000000000E+000 0.000000000000000E+000 202: 0.000000000000000E+000 0.000000000000000E+000 0.000000000000000E+000 202: 0.000000000000000E+000 0.000000000000000E+000 0.000000000000000E+000 202: 0.000000000000000E+000 202: elai 0.796415587611356 0.000000000000000E+000 0.961509001506293 202: 0.000000000000000E+000 0.000000000000000E+000 0.961509001506293 202: 0.000000000000000E+000 0.000000000000000E+000 0.234465085324267 202: 0.000000000000000E+000 0.000000000000000E+000 0.000000000000000E+000 202: esai 9.096157657329497E-002 0.000000000000000E+000 3.849099849370675E-002 202: 0.000000000000000E+000 0.000000000000000E+000 3.849099849370675E-002 202: 0.000000000000000E+000 0.000000000000000E+000 9.398288976575598E-003 202: 0.000000000000000E+000 0.000000000000000E+000 0.000000000000000E+000 202: ftweight 1.267302001703947E-002 0.000000000000000E+000 202: 29.1624152220974 0.000000000000000E+000 0.000000000000000E+000 202: 29.1624152220974 0.000000000000000E+000 0.000000000000000E+000 202: 29.1624152220974 0.000000000000000E+000 0.000000000000000E+000 202: 0.000000000000000E+000 202: cp 6.405767903805394E-010 1 202: bc_in(s)%albgr_dif_rb(ib) 0.190858817093915 202: rhol 0.100000001490116 0.100000001490116 0.100000001490116 202: 0.449999988079071 0.449999988079071 0.349999994039536 202: ftw 1.00000000000000 1.00000000000000 0.000000000000000E+000 202: 0.000000000000000E+000 202: present 1 0 0 202: CAP 1.00000000000000 0.000000000000000E+000 0.000000000000000E+000 331: Large Dir Radn consvn error 87300236774.1395 1 2 331: diags 35545013833.8197 -1.718567028306606E-002 -793747809365.306 331: 496278040697.993 331: lai_change 0.000000000000000E+000 0.000000000000000E+000 331: 0.000000000000000E+000 0.000000000000000E+000 0.000000000000000E+000 331: 0.000000000000000E+000 0.000000000000000E+000 0.000000000000000E+000 331: 0.000000000000000E+000 0.000000000000000E+000 0.000000000000000E+000 331: 0.000000000000000E+000 331: elai 0.776682425289442 0.000000000000000E+000 0.961569569355599 331: 0.000000000000000E+000 0.000000000000000E+000 0.961569569355599 331: 0.000000000000000E+000 0.000000000000000E+000 0.227539226615268 331: 0.000000000000000E+000 0.000000000000000E+000 0.000000000000000E+000 331: esai 9.093202219977818E-002 0.000000000000000E+000 3.843043064440077E-002 331: 0.000000000000000E+000 0.000000000000000E+000 3.843043064440077E-002 331: 0.000000000000000E+000 0.000000000000000E+000 9.101385150350671E-003 331: 0.000000000000000E+000 0.000000000000000E+000 0.000000000000000E+000 331: ftweight 0.143517787345916 0.000000000000000E+000 331: 0.856482212654084 0.000000000000000E+000 0.000000000000000E+000 331: 0.856482212654084 0.000000000000000E+000 0.000000000000000E+000 331: 0.856482212654084 0.000000000000000E+000 0.000000000000000E+000 331: 0.000000000000000E+000 331: cp 2.006325586387992E-009 1 331: bc_in(s)%albgr_dir_rb(ib) 0.220000000000000 331: dif ground absorption error 1 1 -2.968510966153521E+017 331: -2.968510966153521E+017 2 2 1.00000000000000 331: >5% Dif Radn consvn error 4.270016056591235E+016 1 2 331: diags 1.669646990961853E+016 -3.805783289940412E+017 2.374544661398212E+017 331: lai_change 0.000000000000000E+000 0.000000000000000E+000 331: 0.000000000000000E+000 0.000000000000000E+000 0.000000000000000E+000 331: 0.000000000000000E+000 0.000000000000000E+000 0.000000000000000E+000 331: 0.000000000000000E+000 0.000000000000000E+000 0.000000000000000E+000 331: 0.000000000000000E+000 331: elai 0.776682425289442 0.000000000000000E+000 0.961569569355599 331: 0.000000000000000E+000 0.000000000000000E+000 0.961569569355599 331: 0.000000000000000E+000 0.000000000000000E+000 0.227539226615268 331: 0.000000000000000E+000 0.000000000000000E+000 0.000000000000000E+000 331: esai 9.093202219977818E-002 0.000000000000000E+000 3.843043064440077E-002 331: 0.000000000000000E+000 0.000000000000000E+000 3.843043064440077E-002 331: 0.000000000000000E+000 0.000000000000000E+000 9.101385150350671E-003 331: 0.000000000000000E+000 0.000000000000000E+000 0.000000000000000E+000 331: ftweight 7.801052745940848E-002 0.000000000000000E+000 331: 143.470563918829 0.000000000000000E+000 0.000000000000000E+000 331: 143.470563918829 0.000000000000000E+000 0.000000000000000E+000 331: 143.470563918829 0.000000000000000E+000 0.000000000000000E+000 331: 0.000000000000000E+000 331: cp 2.006325586387992E-009 1 331: bc_in(s)%albgr_dif_rb(ib) 0.220000000000000 331: rhol 0.100000001490116 0.100000001490116 0.100000001490116 331: 0.449999988079071 0.449999988079071 0.349999994039536 331: ftw 1.00000000000000 0.143517787345916 0.000000000000000E+000 331: 0.856482212654084 331: present 1 0 1 331: CAP 0.143517787345916 0.000000000000000E+000 0.856482212654084 331: there is still error after correction 1.00000000000000 1 331: 2 202: >5% Dif Radn consvn error -1.07307654594231 1 2 202: diags 8.03407121904317 -55.1147964199711 38.6409503555679 202: lai_change 0.000000000000000E+000 0.000000000000000E+000 202: 0.000000000000000E+000 0.000000000000000E+000 0.000000000000000E+000 202: 0.000000000000000E+000 0.000000000000000E+000 0.000000000000000E+000 202: 0.000000000000000E+000 0.000000000000000E+000 0.000000000000000E+000 202: 0.000000000000000E+000 202: elai 0.796415587611356 0.000000000000000E+000 0.961509001506293 202: 0.000000000000000E+000 0.000000000000000E+000 0.961509001506293 202: 0.000000000000000E+000 0.000000000000000E+000 0.234465085324267 202: 0.000000000000000E+000 0.000000000000000E+000 0.000000000000000E+000 202: esai 9.096157657329497E-002 0.000000000000000E+000 3.849099849370675E-002 202: 0.000000000000000E+000 0.000000000000000E+000 3.849099849370675E-002 202: 0.000000000000000E+000 0.000000000000000E+000 9.398288976575598E-003 202: 0.000000000000000E+000 0.000000000000000E+000 0.000000000000000E+000 202: ftweight 1.267302001703947E-002 0.000000000000000E+000 202: 29.1624152220974 0.000000000000000E+000 0.000000000000000E+000 202: 29.1624152220974 0.000000000000000E+000 0.000000000000000E+000 202: 29.1624152220974 0.000000000000000E+000 0.000000000000000E+000 202: 0.000000000000000E+000 202: cp 6.405767903805394E-010 1 202: bc_in(s)%albgr_dif_rb(ib) 0.190744628923151 202: rhol 0.100000001490116 0.100000001490116 0.100000001490116 202: 0.449999988079071 0.449999988079071 0.349999994039536 202: ftw 1.00000000000000 1.00000000000000 0.000000000000000E+000 202: 0.000000000000000E+000 202: present 1 0 0 202: CAP 1.00000000000000 0.000000000000000E+000 0.000000000000000E+000 331: energy balance in canopy 26844 , err= -11.9593662381158 331: WARNING:: BalanceCheck, solar radiation balance error (W/m2) 331: nstep = 119588 331: errsol = -1323.30638249407 331: clm model is stopping - error is greater than 1e-5 (W/m2) 331: fsa = -7.745702732785249E+017 331: fsr = 7.745702732785236E+017 331: forc_solad(1) = 5.51145480639649 331: forc_solad(2) = 8.61256572561393 331: forc_solai(1) = 16.1417364406403 331: forc_solai(2) = 13.0406255214228 331: forc_tot = 43.3063824940735 331: clm model is stopping 331: calling getglobalwrite with decomp_index= 26844 and clmlevel= pft 331: local patch index = 26844 331: global patch index = 9516 331: global column index = 4795 331: global landunit index = 1267 331: global gridcell index = 296 331: gridcell longitude = 305.000000000000 331: gridcell latitude = -23.0890052356021 331: pft type = 1 331: column type = 1 331: landunit type = 1 331: ENDRUN: 331: ERROR in BalanceCheckMod.F90 at line 543 331: 331:
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/NGEET/fates/issues/378#issuecomment-389031443, or mute the thread https://github.com/notifications/unsubscribe-auth/AMWsQ3aV2BBnc0QhUSS28cWX__BsCupcks5tyktDgaJpZM4Tzp8E .
Dr Rosie A. Fisher
Staff Scientist Terrestrial Sciences Section Climate and Global Dynamics National Center for Atmospheric Research 1850 Table Mesa Drive Boulder, Colorado, 80305 USA. +1 303-497-1706
agreed @rosiealice , whatever is wrong, seems to be mediated by ftweight
I will try to reproduce errors in that last post.
@jkshuman , could you post your create_case execution and any environment modifiers?
relevant parameters:
fates_paramfile = '/glade/p/work/jkshuman/FATES_data/parameter_files/param_file_2PFT_Obrienh_Saldaa_Saldal_05072018.nc'
use_fates = .true.
use_fates_ed_prescribed_phys = .false.
use_fates_ed_st3 = .false.
use_fates_inventory_init = .false.
use_fates_logging = .false.
use_fates_planthydro = .false.
use_fates_spitfire = .true.
fsurdat = '/glade/scratch/jkshuman/sfcdata/surfdata_0.9x1.25_16pfts_Irrig_CMIP6_simyr2000_SA.nc'
ok. I have it down to days. it seems to be hung up, but I will restart from this case in debug mode and take a close look at ftweight. Going to use the 2PFT case as the 1 PFT trop tree run made it out to 51 years with fire. seems a grass and fire issue. But may try the grass single PFT as well... /glade2/scratch2/jkshuman/archive/Fire_Obrienh_Saldaa_Saldal_2PFT_SA1x1_2dba074_0f0c41c/
/glade2/scratch2/jkshuman/archive/Fire_Grass_SA_1x1_2dba074_0f0c41c/
path to restart files for 2PFT case: /glade/scratch/jkshuman/archive/Fire_Obrienh_Saldaa_Saldal_2PFT_SA1x1_2dba074_0f0c41c/rest
path to my script for creating the case, and relevant params below: /glade/p/work/jkshuman/FATES_data/case_fire_TreeGrass_tropics
./create_newcase --case ${casedir}${CASE_NAME} --res f09_f09 --compset 2000_DATM%GSWP3v1_CLM45%FATES_SICE_SOCN_RTM_SGLC_SWAV --run-unsupp\ orted ./xmlchange STOP_OPTION=ndays ./xmlchange STOP_N=1 ./xmlchange REST_OPTION=ndays ./xmlchange RESUBMIT=50
./xmlchange JOB_WALLCLOCK_TIME=1:00
./xmlchange DATM_MODE=CLMGSWP3v1 ./xmlchange DATM_CLMNCEP_YR_ALIGN=1985 ./xmlchange DATM_CLMNCEP_YR_START=1985 ./xmlchange DATM_CLMNCEP_YR_END=2004
./xmlchange RTM_MODE=NULL ./xmlchange ATM_DOMAIN_FILE=domain.lnd.fv0.9x1.25_gx1v6.SA.nc ./xmlchange ATM_DOMAIN_PATH=/glade/scratch/jkshuman/sfcdata ./xmlchange LND_DOMAIN_FILE=domain.lnd.fv0.9x1.25_gx1v6.SA.nc ./xmlchange LND_DOMAIN_PATH=/glade/scratch/jkshuman/sfcdata ./xmlchange CLM_USRDAT_NAME=SAmerica
./xmlchange NTASKS_ATM=-1 ./xmlchange NTASKS_CPL=-15 ./xmlchange NTASKS_GLC=-15 ./xmlchange NTASKS_OCN=-15 ./xmlchange NTASKS_WAV=-15 ./xmlchange NTASKS_ICE=-15 ./xmlchange NTASKS_LND=-15 ./xmlchange NTASKS_ROF=-15 ./xmlchange NTASKS_ESP=-15
relevant parameters in user_nl_clm are as you have them listed. above.
I think we need to look at why ftweight is >1. ftweight is the same as canopy_area_profile, which is set on: https://github.com/NGEET/fates/blob/e522527035c0061f0d31c265e4ccc4dc94b7d3cb/biogeochem/EDCanopyStructureMod.F90#L1337
I'd put a write statement there to catch anything going over 1... (or a slightly bigger number, so we don't get all these 10^-12 edge cases), and then print out the c_area, total_canopy_area, etc. if that happens. If you've got the runs down to days it shouldn't take long to find the culprit there. I'd be quite surprised if the ftweight wasn't the culprit here.
So I was able to trigger an error using just cell -20.09N 305E, and your 2PFT case. The fail happens on April 17th of the 7th year.
FATES Dynamics: 7-04-17
0:forrtl: error (73): floating divide by zero
0:Image PC Routine Line Source
0:cesm.exe 0000000003E1CF91 Unknown Unknown Unknown
0:cesm.exe 0000000003E1B0CB Unknown Unknown Unknown
0:cesm.exe 0000000003DCCBC4 Unknown Unknown Unknown
0:cesm.exe 0000000003DCC9D6 Unknown Unknown Unknown
0:cesm.exe 0000000003D4C4B9 Unknown Unknown Unknown
0:cesm.exe 0000000003D58AE9 Unknown Unknown Unknown
0:libpthread-2.19.s 00002AAAAFAC1870 Unknown Unknown Unknown
0:cesm.exe 0000000002B8581B dynpatchstateupda 189 dynPatchStateUpdaterMod.F90
0:cesm.exe 0000000000A1CCCC dynsubgriddriverm 284 dynSubgridDriverMod.F90
0:cesm.exe 000000000087E555 clm_driver_mp_clm 306 clm_driver.F90
0:cesm.exe 000000000084B5B9 lnd_comp_mct_mp_l 451 lnd_comp_mct.F90
0:cesm.exe 000000000046BD2D component_mod_mp_ 688 component_mod.F90
0:cesm.exe 000000000043C474 cime_comp_mod_mp_ 2652 cime_comp_mod.F90
0:cesm.exe 00000000004543B7 MAIN__ 68 cime_driver.F90
0:cesm.exe 0000000000415A5E Unknown Unknown Unknown
0:libc-2.19.so 00002AAAB190AB25 __libc_start_main Unknown Unknown
0:cesm.exe 0000000000415969 Unknown Unknown Unknown
-1:MPT ERROR: MPI_COMM_WORLD rank 0 has terminated without calling MPI_Finalize()
-1: aborting job
MPT: Received signal 6
That’s interesting. My run with rest option set to days is still going into month 9 day 18 last I checked...
Progress
On Tue, May 15, 2018 at 5:15 PM Ryan Knox notifications@github.com wrote:
So I was able to trigger an error using just cell -20.09N 305E.
FATES Dynamics: 7-04-17
0:cesm.exe 0000000002B8581B dynpatchstateupda 189 dynPatchStateUpdaterMod.F90 ``
— You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub https://github.com/NGEET/fates/issues/378#issuecomment-389344073, or mute the thread https://github.com/notifications/unsubscribe-auth/AVFDhtuqiX7CDhQhLXtv9ugHUwaxA5LYks5ty2GWgaJpZM4Tzp8E .
-- Jacquelyn Shuman Terrestrial Sciences Section NCAR
Got it to day of failure (October 30 year 7). Will kick it off in debug to see if I get the same error as you did @rgknox (similar error as previous, and same location: long = 305 lat = -23.089
from cesm.log
bc_in(s)%albgr_dif_rb(ib) 0.220000000000000
331: rhol 0.100000001490116 0.100000001490116 0.100000001490116
331: 0.449999988079071 0.449999988079071 0.349999994039536
331: ftw 1.00000000000000 0.143517787251814 0.000000000000000E+000
331: 0.856482212748186
331: present 1 0 1
331: CAP 0.143517787251814 0.000000000000000E+000 0.856482212748186
331: there is still error after correction 1.00000000000000 1
331: 2
202: >5% Dif Radn consvn error -1.07341422635010 1 2
202: diags 8.03574910457470 -55.1258110560189 38.6485853190346
202: lai_change 0.000000000000000E+000 0.000000000000000E+000
202: 0.000000000000000E+000 0.000000000000000E+000 0.000000000000000E+000
202: 0.000000000000000E+000 0.000000000000000E+000 0.000000000000000E+000
202: 0.000000000000000E+000 0.000000000000000E+000 0.000000000000000E+000
202: 0.000000000000000E+000
202: elai 0.796415126488024 0.000000000000000E+000 0.961509014797645
202: 0.000000000000000E+000 0.000000000000000E+000 0.961509014797645
202: 0.000000000000000E+000 0.000000000000000E+000 0.234466930897031
202: 0.000000000000000E+000 0.000000000000000E+000 0.000000000000000E+000
202: esai 9.096157669642455E-002 0.000000000000000E+000 3.849098520235514E-002
202: 0.000000000000000E+000 0.000000000000000E+000 3.849098520235514E-002
202: 0.000000000000000E+000 0.000000000000000E+000 9.398356961483976E-003
202: 0.000000000000000E+000 0.000000000000000E+000 0.000000000000000E+000
202: ftweight 1.267295049486910E-002 0.000000000000000E+000
202: 29.1628509591272 0.000000000000000E+000 0.000000000000000E+000
202: 29.1628509591272 0.000000000000000E+000 0.000000000000000E+000
202: 29.1628509591272 0.000000000000000E+000 0.000000000000000E+000
202: 0.000000000000000E+000
202: cp 6.410821458268472E-010 1
202: bc_in(s)%albgr_dif_rb(ib) 0.190743513017422
202: rhol 0.100000001490116 0.100000001490116 0.100000001490116
202: 0.449999988079071 0.449999988079071 0.349999994039536
202: ftw 1.00000000000000 1.00000000000000 0.000000000000000E+000
202: 0.000000000000000E+000
202: present 1 0 0
202: CAP 1.00000000000000 0.000000000000000E+000 0.000000000000000E+000
331: energy balance in canopy 26844 , err= -11.9601284804630
331: WARNING:: BalanceCheck, solar radiation balance error (W/m2)
331: nstep = 119588
331: errsol = 724.693617505926
331: clm model is stopping - error is greater than 1e-5 (W/m2)
331: fsa = -7.745702333124070E+017
331: fsr = 7.745702333124078E+017
331: forc_solad(1) = 5.51145480639649
331: forc_solad(2) = 8.61256572561393
331: forc_solai(1) = 16.1417364406403
331: forc_solai(2) = 13.0406255214228
331: forc_tot = 43.3063824940735
331: clm model is stopping
331: calling getglobalwrite with decomp_index= 26844 and clmlevel= pft
331: local patch index = 26844
331: global patch index = 9516
331: global column index = 4795
331: global landunit index = 1267
331: global gridcell index = 296
331: gridcell longitude = 305.000000000000
331: gridcell latitude = -23.0890052356021
331: pft type = 1
331: column type = 1
331: landunit type = 1
331: ENDRUN:
331: ERROR in BalanceCheckMod.F90 at line 543
331:
331:
331:
Here is a print message at the time of fail, this is from subroutine set_new_weights() in dynPatchStateUpdaterMod.F90.
The problem is triggered because from the second-to-last step to the last, that bare-ground patch goes to a weight of zero, and somehow its old (previous) area was negative?
print*,bounds%begp,bounds%endp,p,this%pwtgcell_old(p),this%pwtgcell_new(p)
0: 1 32 3 0.998904682346343 0.998904682346344
0: 1 32 3 0.998904682346344 0.998904682346344
0: 1 32 3 0.998904682346344 0.998904682346344
0: 1 32 1 -2.218013955499719E-016 0.000000000000000E+000
subroutine set_new_weights(this, bounds)
!
! !DESCRIPTION:
! Set subgrid weights after dyn subgrid updates
!
! !USES:
!
! !ARGUMENTS:
class(patch_state_updater_type), intent(inout) :: this
type(bounds_type), intent(in) :: bounds
!
! !LOCAL VARIABLES:
integer :: p
character(len=*), parameter :: subname = 'set_new_weights'
!-----------------------------------------------------------------------
do p = bounds%begp, bounds%endp
this%pwtgcell_new(p) = patch%wtgcell(p)
this%dwt(p) = this%pwtgcell_new(p) - this%pwtgcell_old(p)
if (this%dwt(p) > 0._r8) then
print*,bounds%begp,bounds%endp,p,this%pwtgcell_old(p),this%pwtgcell_new(p)
this%growing_old_fraction(p) = this%pwtgcell_old(p) / this%pwtgcell_new(p)
this%growing_new_fraction(p) = this%dwt(p) / this%pwtgcell_new(p)
else
! These values are unused in this case, but set them to something reasonable for
! safety. (We could set them to NaN, but that requires a more expensive
! subroutine call, using the shr_infnan_mod infrastructure.)
this%growing_old_fraction(p) = 1._r8
this%growing_new_fraction(p) = 0._r8
end if
end do
end subroutine set_new_weights
The interface call wrap_update_hlmfates_dyn(), in clmfates_interfaceMod.F90, is responsible for calculating these weights.
We sum up the canopy fractions, via this output boundary condition:
this%fates(nc)%bc_out(s)%canopy_fraction_pa(1:npatch)
But if this sum is above 1, which it shouldn't be, we will have problems, and calculate a negative bare-patch size. Somehow that is happening in this run. I put a break-point where this endrun used to be:
https://github.com/ESCOMP/ctsm/blob/master/src/utils/clmfates_interfaceMod.F90#L830
I think one bug is that we are not zero'ing out bc_out(s)%canopy_fraction_pa(1:npatch) in the subroutine that is filling it update_hlm_dynamics() . So if we shrink in total number of patches, we have an extra index that is contributing to total patch area. I will test this.
actually, that probably wasn't the problem... although zero'ing would had been better, we should be only passing the used indexes in that array...
Are we sure that the bug is fire specific? Has it shown up in any non-fire runs @jkshuman? If is it fire, my suspicion might be to do with how the model handles completely burned patches.
On Wed, May 16, 2018, 2:41 PM Ryan Knox notifications@github.com wrote:
actually, that probably wasn't the problem... although zero'ing would had been better, we should be only passing the used indexes in that array...
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/NGEET/fates/issues/378#issuecomment-389658540, or mute the thread https://github.com/notifications/unsubscribe-auth/AMWsQ52GXweSmW15nLzuOdoDmTxjrmiEks5tzI8DgaJpZM4Tzp8E .
I have been focusing on the fire runs. With the updates to master, and continued testing the fail still occurs for grass and for tree/grass runs with fire. I had a tree fire run which completed through year 51 with reasonable biomass. My 2PFT debug fire run is in queue still, so no update there.
With grass the difference is that when it burns, it burns completely. So, this could be a response to the grass flammability specifically and, as @rosiealice said, completely burned patches.
For the problem I'm currently working through (which may or may not be related to what is ultimately killing Jackie's runs), one problem is that total_canopy_area is exceeding patch area. We currently don't force total_canopy_area to be equal to or less than patch area.
I'm also noticing that when we do canopy promotion/demotion, that we have a fairly relaxed tolerance on layer area exeedance of patch area: 1e-4.
I'm wondering if grasses give the canopy demotion/promotion scheme a particularly challenging time at layering? Maybe in this specific case we are left with not-so precise canopy area, which is creating weirdness?
Here is an error log that I think corroborates with the ftweight issue. During leaf_area_profile(), we construct several canopy-layer x pft x leaf-layer arrays. cpatch%canopy_area_profile(cl,ft,iv) is converted directly into ftweight. We have a few checks in the scheme, which can be switched on, one of which fails gracefully, if canopy_area_profile exceeds 1.0 for any given layer.
FATES: A canopy_area_profile exceeded 1.0
cl: 1
iv: 1
sum(cpatch%canopy_area_profile(cl,:,iv)): 1.65653669059244
FATES: cohorts in layer cl = 1 0.376936443831203
7.401777278905496E-009 2.698777192878076E-008 2.698777192878076E-008
ED: fracarea 3 0.274264111110705
FATES: cohorts in layer cl = 1 4.47710468466018
1.069014260600514E-009 2.698777192878076E-008 2.698777192878076E-008
ED: fracarea 1 3.961106027654241E-002
FATES: cohorts in layer cl = 1 4.79421520149869
5.313109854499176E-010 2.698777192878076E-008 2.698777192878076E-008
ED: fracarea 1 1.968710076741488E-002
FATES: cohorts in layer cl = 1 5.13024998876371
6.459332537834644E-010 2.698777192878076E-008 2.698777192878076E-008
ED: fracarea 1 2.393429348254634E-002
FATES: cohorts in layer cl = 1 5.79933797252383
3.505819861862652E-008 2.698777192878076E-008 2.698777192878076E-008
ED: fracarea 1 1.29904012495523
In this case, we have a few cohorts contributing crown area to the offending layer, layer 1. Layer 1 is also the top layer, and it should be assumed there is an understory layer also. The cohorts appear to be normal, no nans, no garbage values... It is a small patch in terms of area, and it has a combination of PFT1 and PFT 3 in that layer.
Note that the area fraction of the last cohort is 130% of the area. I'm not sure why the other cohorts are sharing the top layer (cl==1) with it, if this cohort, which is the largest, is filling that layer completely. This is particularly strange/wrong because we have grasses sharing that layer with a couple of 5 cm cohorts.
I'm wondering if this is a precision problem, as indicated in a post above. The area on this patch is very small, but large enough to keep. Although, the promotion/demotion precision is about 4 orders of magnitude larger than the size of the patch...
New runs using 1) rgknox promotion/demotion updates PR 388, 2) updated API 4.0.0, 3) updated CTSM changes. Two runs: one using clm45 or clm5 with 2PFTs (TropTree and Grass) and active fire.
clm45 completed to year 63 and still running, in queue at the moment. /glade2/scratch2/jkshuman/archive/Fire_rgknox_area_fixes_clm45_2PFT_1x1_692ba82_992e968/lnd/hist
clm5 failed in year 6 with error in EdPatchDynamicsMod.F90 associated with high fire area and patch trimming. /glade2/scratch2/jkshuman/Fire_rgknox-area-fixes_2PFT_1x1_692ba82_992e968/run
from cesm.log
very high fire areas 0.983208971507476 0.983208971507476
413: Projected Canopy Area of all FATES patches
413: cannot exceed 1.0
517: trimming patch area - is too big 1.818989403545856E-012
570: trimming patch area - is too big 1.818989403545856E-012
533: trimming patch area - is too big 1.818989403545856E-012
110: trimming patch area - is too big 1.818989403545856E-012
110: patch area correction produced negative area 10000.0000000000
110: 1.818989403545856E-012 -4.939832763539551E-013
61: trimming patch area - is too big 1.818989403545856E-012
443: trimming patch area - is too big 1.818989403545856E-012
110: ENDRUN:
110: ERROR in EDPatchDynamicsMod.F90 at line 722
110:
110:
110:
110:
110:
110:
110: ERROR: Unknown error submitted to shr_abort_abort.
431: Projected Canopy Area of all FATES patches
431: cannot exceed 1.0
@jkshuman , that new fail is an error check that I put into that branch you are currently testing.
What happened is that the model determined that the total patch area exceeded 10,000 m2, and so it simply removes the excess from one of it's patches. But, we have been removing it from the oldest patch. HOwever, up until now, we have never checked to see if that patch has the area to donate.
This can be solved by removing the area from the largest patch, instead of the oldest patch.
I will make a correction and update the branch.
Updated the branch. Here is the change:
https://github.com/NGEET/fates/pull/388/commits/e85b681462529e20406a210a67e25325669cb1cf
@jkshuman , I will fire off some tests.
hold a moment before testing though, it needs a quick tweak, forgot to declare "nearzero"
HI Ryan,
Thanks for this. Should we have a call, or hold off until the tests go?
2018-06-06 12:51 GMT-06:00 Ryan Knox notifications@github.com:
hold a moment before testing though, it needs a quick tweak, forgot to declare "nearzero"
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/NGEET/fates/issues/378#issuecomment-395175440, or mute the thread https://github.com/notifications/unsubscribe-auth/AMWsQ2X_iK53oxccs2RDVPunqglPUVxWks5t6CTEgaJpZM4Tzp8E .
Dr Rosie A. Fisher
Staff Scientist Terrestrial Sciences Section Climate and Global Dynamics National Center for Atmospheric Research 1850 Table Mesa Drive Boulder, Colorado, 80305 USA. +1 303-497-1706
@jkshuman @rosiealice and I had a review and discussion of changes in PR #388. Added some updates to code per our discussion. @jkshuman I'm going to pass it through the regression tests now.
Revising this to correct my mistaken runs from earlier. Confirmed that the branch code pulled in the correct changes from rgknox repo. Updated code with more rgknox-area-fixes (commit 658064e) and ctsm changes. Similar setup CLM45 and clm5 with active fire and 2PFTs (trop tree and grass) for South America region. CLM5 successfully running into year 18, and still going... CLM45 successfully running into year 20, and still going...
clm5: /glade/scratch/jkshuman/archive/Fire_rgknox_areafixes_0607_2PFT_1x1_fdce2b2_26542ea/ clm45:/glade/scratch/jkshuman/archive/Fire_rgknox_areafixes_0607_clm45_2PFT_1x1_fdce2b2_26542ea/
Runs are up to year 92 for clm5 and year 98 for clm45. I am going to call this closed, and open a new issue if anything else comes up as the code has diverged since opening this... To summarize: fixes included pull requests PR382 and PR388 and @rgknox fixes in repo branches for fates and ctsm. ctsm branch from rgknox_ctsm_repo-protectbaresoilfrac fates branch from rgknox-area-fix merged with master sci.1.14.0_api.4.0.0
branch details for ctsm and fates below.
fates git log details: 26542ea (HEAD, rgknox-areafix-0607_api4.0.0) Merge branch 'rgknox-area-fixes' into rgknox-areafix-0607_api4.0.0 ce689da (rgknox-area-fixes) Merge branch 'rgknox-area-fixes' of https://github.com/rgknox/fates into rgknox-area-fixes 658064e (rgknox_repo/rgknox-area-fixes) Updated some comments, added back protections on patch canopy areas exceeding 1 during the output boundary condition preparations. c357399 Merge branch 'rgknox-area-fixes' of github.com:rgknox/fates into rgknox-area-fixes e85b681 Fixed area checking logic on their sum to 10k 0f2003b Merge remote-tracking branch 'rgknox_repo/rgknox-area-fixes' into rgknox-area-fixes 34bfcdb Resolved conflict in EDCanopyStructureMod, used HEAD over master 5e92e69 (master) Merge remote-tracking branch 'ngeet_repo/master' 14aeb4f (tag: sci.1.14.0_api.4.0.0, ngeet_repo/master) Merge pull request #381 from rgknox/rgknox-soildepth-clm5
ctsm git log details: fdce2b2 (HEAD, rgknox_ctsm_repo/rgknox-fates-protectbaresoilfrac, rgknox-fates-protectbaresoilfrac, fates_next_api_rgknox_protectbaresoilfrac) Protected fates calculation of bare-soil area to not go below 0 692ba82 (origin/fates_next_api, fates_next_api) Merge pull request #375 from rgknox/rgknox-fates-varsoildepth 1cdd0e6 Merge pull request #390 from ckoven/fateshistdims 8eb90b1 (rgknox_ctsm_repo/rgknox-fates-varsoildepth) Changed a 1.0 r4 to r8 e9b7b68 Updating fates external to sci.1.14.0_api.4.0.0
Great !
Le ven. 8 juin 2018 à 13:25, jkshuman notifications@github.com a écrit :
Runs are up to year 92 for clm5 and year 98 for clm45. I am going to call this closed, and open a new issue if anything else comes up as the code has diverged since opening this... To summarize: fixes included pull requests PR382 and PR388 and @rgknox https://github.com/rgknox fixes in repo branches for fates and ctsm. ctsm branch from rgknox_ctsm_repo-protectbaresoilfrac fates branch from rgknox-area-fix merged with master sci.1.14.0_api.4.0.0
branch details for ctsm and fates below.
fates git log details: 26542ea (HEAD, rgknox-areafix-0607_api4.0.0) Merge branch 'rgknox-area-fixes' into rgknox-areafix-0607_api4.0.0 ce689da (rgknox-area-fixes) Merge branch 'rgknox-area-fixes' of https://github.com/rgknox/fates into rgknox-area-fixes 658064e https://github.com/NGEET/fates/commit/658064ebdc5cd52ea7aed9ffd8385e4745b5b5bb (rgknox_repo/rgknox-area-fixes) Updated some comments, added back protections on patch canopy areas exceeding 1 during the output boundary condition preparations. c357399 https://github.com/NGEET/fates/commit/c357399a047793fd77ad07af35742db88be89cc5 Merge branch 'rgknox-area-fixes' of github.com:rgknox/fates into rgknox-area-fixes e85b681 https://github.com/NGEET/fates/commit/e85b681462529e20406a210a67e25325669cb1cf Fixed area checking logic on their sum to 10k 0f2003b Merge remote-tracking branch 'rgknox_repo/rgknox-area-fixes' into rgknox-area-fixes 34bfcdb https://github.com/NGEET/fates/commit/34bfcdb4ac7e121e139b38539c80501489c9dca2 Resolved conflict in EDCanopyStructureMod, used HEAD over master 5e92e69 (master) Merge remote-tracking branch 'ngeet_repo/master' 14aeb4f https://github.com/NGEET/fates/commit/14aeb4fb66c7f9291eab70e3d9779b837314ff83 (tag: sci.1.14.0_api.4.0.0, ngeet_repo/master) Merge pull request #381 https://github.com/NGEET/fates/pull/381 from rgknox/rgknox-soildepth-clm5
ctsm git log details: fdce2b2 (HEAD, rgknox_ctsm_repo/rgknox-fates-protectbaresoilfrac, rgknox-fates-protectbaresoilfrac, fates_next_api_rgknox_protectbaresoilfrac) Protected fates calculation of bare-soil area to not go below 0 692ba82 (origin/fates_next_api, fates_next_api) Merge pull request #375 https://github.com/NGEET/fates/pull/375 from rgknox/rgknox-fates-varsoildepth 1cdd0e6 Merge pull request #390 https://github.com/NGEET/fates/issues/390 from ckoven/fateshistdims 8eb90b1 (rgknox_ctsm_repo/rgknox-fates-varsoildepth) Changed a 1.0 r4 to r8 e9b7b68 Updating fates external to sci.1.14.0_api.4.0.0
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/NGEET/fates/issues/378#issuecomment-395864233, or mute the thread https://github.com/notifications/unsubscribe-auth/AMWsQzzI16Bn62hSpUdkCJ22Ewlve8S3ks5t6s-5gaJpZM4Tzp8E .
Dr Rosie A. Fisher
Staff Scientist Terrestrial Sciences Section Climate and Global Dynamics National Center for Atmospheric Research 1850 Table Mesa Drive Boulder, Colorado, 80305 USA. +1 303-497-1706
Getting a fail in fire runs. Seems to be due to a Balance Check. This happens in both CLM45 runs and CLM5 runs at year 5 with 2PFTs (Trop tree and Grass). Non-fire runs haven't failed through year 10, but will resubmit longer. ctsm git hash: 2dba074 fates git hash: f8d7693 Here is the create case statement: ./create_newcase --case ${casedir}${CASE_NAME} --res f09_f09 --compset 2000_DATM%GSWP3v1_CLM45%FATES_SICE_SOCN_RTM_SGLC_SWAV --run-unsupported
from within cesm.log (and end of cesm.log below) 396: WARNING:: BalanceCheck, solar radiation balance error (W/m2) 396: nstep = 96934 396: errsol = -1.031027636599902E-007 529: Large Dir Radn consvn error 87346.4733653322 1 2 529: diags 46218.1932574409 -0.338494232152740 589450.614042712, errorcode=1001)
529:MPT: at abort.c:66
529:MPT: #5 0x00002aaab157528d in pmpi_abort ()
529:MPT: from /opt/sgi/mpt/mpt-2.15/lib/libmpi.so
529:MPT: #6 0x0000000000e191a9 in shr_mpi_mod_mp_shr_mpiabort ()
529:MPT: at /glade/p/work/jkshuman/git/ctsm/cime/src/share/util/shr_mpi_mod.F90:2132
529:MPT: #7 0x0000000000d1b4d8 in shr_abort_mod_mp_shr_abortabort ()
529:MPT: at /glade/p/work/jkshuman/git/ctsm/cime/src/share/util/shr_abort_mod.F90:69
529:MPT: #8 0x0000000000503cd5 in abortutils_mp_endrunglobalindex ()
529:MPT: at /glade/p/work/jkshuman/git/ctsm/src/main/abortutils.F90:77
529:MPT: #9 0x0000000000677e2d in balancecheckmod_mpbalancecheck ()
529:MPT: at /glade/p/work/jkshuman/git/ctsm/src/biogeophys/BalanceCheckMod.F90:543
529:MPT: #10 0x000000000050af77 in clm_driver_mp_clmdrv ()
529:MPT: at /glade/p/work/jkshuman/git/ctsm/src/main/clm_driver.F90:924
529:MPT: #11 0x00000000004f9516 in lnd_comp_mct_mp_lnd_runmct ()
529:MPT: at /glade/p/work/jkshuman/git/ctsm/src/cpl/lnd_comp_mct.F90:451
529:MPT: #12 0x0000000000430e14 in component_mod_mp_componentrun ()
529:MPT: at /glade/p/work/jkshuman/git/ctsm/cime/src/drivers/mct/main/component_mod.F90:688
529:MPT: #13 0x0000000000417d59 in cime_comp_mod_mp_cimerun ()
529:MPT: at /glade/p/work/jkshuman/git/ctsm/cime/src/drivers/mct/main/cime_comp_mod.F90:2652
529:MPT: #14 0x0000000000430b3d in MAIN__ ()
529:MPT: at /glade/p/work/jkshuman/git/ctsm/cime/src/drivers/mct/main/cime_driver.F90:68
529:MPT: #15 0x0000000000415c5e in main ()
529:MPT: (gdb) A debugging session is active.
529:MPT:
529:MPT: Inferior 1 [process 53637] will be detached.
529:MPT:
529:MPT: Quit anyway? (y or n) [answered Y; input not from terminal]
529:MPT: Detaching from program: /proc/53637/exe, process 53637
529:
529:MPT: -----stack traceback ends-----
-1:MPT ERROR: MPI_COMM_WORLD rank 529 has terminated without calling MPI_Finalize()
-1: aborting job
529: -394259.718697869
529: lai_change 0.000000000000000E+000 0.000000000000000E+000 529: 0.000000000000000E+000 0.000000000000000E+000 0.000000000000000E+000 529: 6.38062653664038 0.000000000000000E+000 0.000000000000000E+000 529: 0.000000000000000E+000 0.000000000000000E+000 0.000000000000000E+000 529: 0.000000000000000E+000 529: elai 0.000000000000000E+000 0.000000000000000E+000 0.961064260932761
529: 0.000000000000000E+000 0.000000000000000E+000 0.958469792135196
529: 0.000000000000000E+000 0.000000000000000E+000 0.122722763358372
529: 0.000000000000000E+000 0.000000000000000E+000 0.000000000000000E+000 529: esai 0.000000000000000E+000 0.000000000000000E+000 3.893573906723917E-002 529: 0.000000000000000E+000 0.000000000000000E+000 3.883117669682943E-002 529: 0.000000000000000E+000 0.000000000000000E+000 4.984874625802597E-003 529: 0.000000000000000E+000 0.000000000000000E+000 0.000000000000000E+000 529: ftweight 1.00000000000000 0.000000000000000E+000 529: 0.000000000000000E+000 1.00000000000000 0.000000000000000E+000 529: 0.000000000000000E+000 0.000000000000000E+000 0.000000000000000E+000 529: 0.000000000000000E+000 0.000000000000000E+000 0.000000000000000E+000 529: 0.000000000000000E+000 529: cp 9.580078716659667E-011 1 529: bc_in(s)%albgr_dir_rb(ib) 0.557730205770928
529: >5% Dif Radn consvn error -2474470293.77894 1 2 529: diags 639144447.809849 -10366553911.8306 6420139512.41898
529: lai_change 0.000000000000000E+000 0.000000000000000E+000 529: 0.000000000000000E+000 0.000000000000000E+000 0.000000000000000E+000 529: 6.38062653664038 0.000000000000000E+000 0.000000000000000E+000 529: 0.000000000000000E+000 0.000000000000000E+000 0.000000000000000E+000 529: 0.000000000000000E+000 529: elai 0.000000000000000E+000 0.000000000000000E+000 0.961064260932761
529: 0.000000000000000E+000 0.000000000000000E+000 0.958469792135196
529: 0.000000000000000E+000 0.000000000000000E+000 0.122722763358372
529: 0.000000000000000E+000 0.000000000000000E+000 0.000000000000000E+000 529: esai 0.000000000000000E+000 0.000000000000000E+000 3.893573906723917E-002 529: 0.000000000000000E+000 0.000000000000000E+000 3.883117669682943E-002 529: 0.000000000000000E+000 0.000000000000000E+000 4.984874625802597E-003 529: 0.000000000000000E+000 0.000000000000000E+000 0.000000000000000E+000 529: ftweight 0.000000000000000E+000 0.000000000000000E+000 529: 37.4271707468345 0.000000000000000E+000 0.000000000000000E+000 529: 37.4271707468345 0.000000000000000E+000 0.000000000000000E+000 529: 31.0465442101942 0.000000000000000E+000 0.000000000000000E+000 529: 0.000000000000000E+000 529: cp 9.580078716659667E-011 1 529: bc_in(s)%albgr_dif_rb(ib) 0.557730205770928
529: rhol 0.100000001490116 0.100000001490116 0.100000001490116
529: 0.449999988079071 0.449999988079071 0.349999994039536
529: ftw 1.00000000000000 1.00000000000000 0.000000000000000E+000 529: 0.000000000000000E+000 529: present 1 0 0 529: CAP 1.00000000000000 0.000000000000000E+000 0.000000000000000E+000 465: WARNING:: BalanceCheck, solar radiation balance error (W/m2) 465: nstep = 96935 465: errsol = -1.048202307174506E-007 433: WARNING:: BalanceCheck, solar radiation balance error (W/m2) 433: nstep = 96935 433: errsol = -1.017730255625793E-007 358: WARNING:: BalanceCheck, solar radiation balance error (W/m2) 358: nstep = 96936 358: errsol = -1.278503987123258E-007 432: WARNING:: BalanceCheck, solar radiation balance error (W/m2) 432: nstep = 96936 432: errsol = -1.040576194100140E-007 431: WARNING:: BalanceCheck, solar radiation balance error (W/m2) 431: nstep = 96936 431: errsol = -1.129041606873216E-007 466: WARNING:: BalanceCheck, solar radiation balance error (W/m2) 466: nstep = 96936 466: errsol = -1.248336616299639E-007 433: WARNING:: BalanceCheck, solar radiation balance error (W/m2) 433: nstep = 96936 433: errsol = -1.003071474769968E-007 529: WARNING:: BalanceCheck, solar radiation balance error (W/m2) 529: nstep = 96936 529: errsol = 1.383552742595384E-005 529: clm model is stopping - error is greater than 1e-5 (W/m2) 529: fsa = 12787101170.2958
529: fsr = -12787101148.9356
529: forc_solad(1) = 2.30644280577964
529: forc_solad(2) = 3.71261017842798
529: forc_solai(1) = 8.37364785641270
529: forc_solai(2) = 6.96748048376436
529: forc_tot = 21.3601813243847
529: clm model is stopping 529: calling getglobalwrite with decomp_index= 39670 and clmlevel= pft 529: local patch index = 39670 529: global patch index = 15897 529: global column index = 8008 529: global landunit index = 2104 529: global gridcell index = 494 529: gridcell longitude = 290.000000000000
529: gridcell latitude = -15.5497382198953
529: pft type = 1 529: column type = 1 529: landunit type = 1 529: ENDRUN: 529: ERROR in BalanceCheckMod.F90 at line 543
396: WARNING:: BalanceCheck, solar radiation balance error (W/m2) 396: nstep = 96934 396: errsol = -1.031027636599902E-007 529: Large Dir Radn consvn error 87346.4733653322 1 2 529: diags 46218.1932574409 -0.338494232152740 589450.614042712
529: -394259.718697869
529: lai_change 0.000000000000000E+000 0.000000000000000E+000 529: 0.000000000000000E+000 0.000000000000000E+000 0.000000000000000E+000 529: 6.38062653664038 0.000000000000000E+000 0.000000000000000E+000 529: 0.000000000000000E+000 0.000000000000000E+000 0.000000000000000E+000 529: 0.000000000000000E+000 529: elai 0.000000000000000E+000 0.000000000000000E+000 0.961064260932761
529: 0.000000000000000E+000 0.000000000000000E+000 0.958469792135196
529: 0.000000000000000E+000 0.000000000000000E+000 0.122722763358372
529: 0.000000000000000E+000 0.000000000000000E+000 0.000000000000000E+000 529: esai 0.000000000000000E+000 0.000000000000000E+000 3.893573906723917E-002 529: 0.000000000000000E+000 0.000000000000000E+000 3.883117669682943E-002 529: 0.000000000000000E+000 0.000000000000000E+000 4.984874625802597E-003 529: 0.000000000000000E+000 0.000000000000000E+000 0.000000000000000E+000 529: ftweight 1.00000000000000 0.000000000000000E+000 529: 0.000000000000000E+000 1.00000000000000 0.000000000000000E+000 529: 0.000000000000000E+000 0.000000000000000E+000 0.000000000000000E+000 529: 0.000000000000000E+000 0.000000000000000E+000 0.000000000000000E+000 529: 0.000000000000000E+000 529: cp 9.580078716659667E-011 1 529: bc_in(s)%albgr_dir_rb(ib) 0.557730205770928
529: >5% Dif Radn consvn error -2474470293.77894 1 2 529: diags 639144447.809849 -10366553911.8306 6420139512.41898
529: lai_change 0.000000000000000E+000 0.000000000000000E+000 529: 0.000000000000000E+000 0.000000000000000E+000 0.000000000000000E+000 529: 6.38062653664038 0.000000000000000E+000 0.000000000000000E+000 529: 0.000000000000000E+000 0.000000000000000E+000 0.000000000000000E+000 529: 0.000000000000000E+000 529: elai 0.000000000000000E+000 0.000000000000000E+000 0.961064260932761
529: 0.000000000000000E+000 0.000000000000000E+000 0.958469792135196
529: 0.000000000000000E+000 0.000000000000000E+000 0.122722763358372
529: 0.000000000000000E+000 0.000000000000000E+000 0.000000000000000E+000 529: esai 0.000000000000000E+000 0.000000000000000E+000 3.893573906723917E-002 529: 0.000000000000000E+000 0.000000000000000E+000 3.883117669682943E-002 529: 0.000000000000000E+000 0.000000000000000E+000 4.984874625802597E-003 529: 0.000000000000000E+000 0.000000000000000E+000 0.000000000000000E+000 529: ftweight 0.000000000000000E+000 0.000000000000000E+000 529: 37.4271707468345 0.000000000000000E+000 0.000000000000000E+000 529: 37.4271707468345 0.000000000000000E+000 0.000000000000000E+000 529: 31.0465442101942 0.000000000000000E+000 0.000000000000000E+000 529: 0.000000000000000E+000 529: cp 9.580078716659667E-011 1 529: bc_in(s)%albgr_dif_rb(ib) 0.557730205770928
529: rhol 0.100000001490116 0.100000001490116 0.100000001490116
529: 0.449999988079071 0.449999988079071 0.349999994039536
529: ftw 1.00000000000000 1.00000000000000 0.000000000000000E+000 529: 0.000000000000000E+000 529: present 1 0 0 529: CAP 1.00000000000000 0.000000000000000E+000 0.000000000000000E+000 465: WARNING:: BalanceCheck, solar radiation balance error (W/m2) 465: nstep = 96935 465: errsol = -1.048202307174506E-007 433: WARNING:: BalanceCheck, solar radiation balance error (W/m2) 433: nstep = 96935 433: errsol = -1.017730255625793E-007 358: WARNING:: BalanceCheck, solar radiation balance error (W/m2) 358: nstep = 96936 358: errsol = -1.278503987123258E-007 432: WARNING:: BalanceCheck, solar radiation balance error (W/m2) 432: nstep = 96936 432: errsol = -1.040576194100140E-007 431: WARNING:: BalanceCheck, solar radiation balance error (W/m2) 431: nstep = 96936 431: errsol = -1.129041606873216E-007 466: WARNING:: BalanceCheck, solar radiation balance error (W/m2) 466: nstep = 96936 466: errsol = -1.248336616299639E-007 433: WARNING:: BalanceCheck, solar radiation balance error (W/m2) 433: nstep = 96936 433: errsol = -1.003071474769968E-007 529: WARNING:: BalanceCheck, solar radiation balance error (W/m2) 529: nstep = 96936 529: errsol = 1.383552742595384E-005 529: clm model is stopping - error is greater than 1e-5 (W/m2) 529: fsa = 12787101170.2958
529: fsr = -12787101148.9356
529: forc_solad(1) = 2.30644280577964
529: forc_solad(2) = 3.71261017842798
529: forc_solai(1) = 8.37364785641270
529: forc_solai(2) = 6.96748048376436
529: forc_tot = 21.3601813243847
529: clm model is stopping 529: calling getglobalwrite with decomp_index= 39670 and clmlevel= pft 529: local patch index = 39670 529: global patch index = 15897 529: global column index = 8008 529: global landunit index = 2104 529: global gridcell index = 494 529: gridcell longitude = 290.000000000000
529: gridcell latitude = -15.5497382198953
529: pft type = 1 529: column type = 1 529: landunit type = 1 529: ENDRUN: 529: ERROR in BalanceCheckMod.F90 at line 543
529:
529:
529:
529:
529:
529: ERROR: Unknown error submitted to shr_abort_abort. 413: WARNING:: BalanceCheck, solar radiation balance error (W/m2) 413: nstep = 96936 413: errsol = -1.288894111439731E-007 397: WARNING:: BalanceCheck, solar radiation balance error (W/m2) 397: nstep = 96937 397: errsol = -1.022812625706138E-007 319: WARNING:: BalanceCheck, solar radiation balance error (W/m2) 319: nstep = 96937 319: errsol = -1.036731305248395E-007 395: WARNING:: BalanceCheck, solar radiation balance error (W/m2) 395: nstep = 96937 395: errsol = -1.211479911944480E-007 432: WARNING:: BalanceCheck, solar radiation balance error (W/m2) 432: nstep = 96937 432: errsol = -1.264885440832586E-007 464: WARNING:: BalanceCheck, solar radiation balance error (W/m2) 464: nstep = 96937 464: errsol = -1.101450379792368E-007 431: WARNING:: BalanceCheck, solar radiation balance error (W/m2) 431: nstep = 96937 431: errsol = -1.387476800118748E-007 433: WARNING:: BalanceCheck, solar radiation balance error (W/m2) 433: nstep = 96937 433: errsol = -1.261905708815902E-007 529:Image PC Routine Line Source
529:cesm.exe 0000000001237DAD Unknown Unknown Unknown 529:cesm.exe 0000000000D1B432 shr_abort_modmp 114 shr_abort_mod.F90 529:cesm.exe 0000000000503CD5 abortutils_mp_end 77 abortutils.F90 529:cesm.exe 0000000000677E2D balancecheckmod_m 543 BalanceCheckMod.F90 529:cesm.exe 000000000050AF77 clm_driver_mp_clm 924 clm_driver.F90 529:cesm.exe 00000000004F9516 lnd_comp_mct_mp_l 451 lnd_comp_mct.F90 529:cesm.exe 0000000000430E14 component_modmp 688 component_mod.F90 529:cesm.exe 0000000000417D59 cime_comp_modmp 2652 cime_comp_mod.F90 529:cesm.exe 0000000000430B3D MAIN__ 68 cime_driver.F90 529:cesm.exe 0000000000415C5E Unknown Unknown Unknown 529:libc-2.19.so 00002AAAB190AB25 libc_start_main Unknown Unknown 529:cesm.exe 0000000000415B69 Unknown Unknown Unknown 529:MPT ERROR: Rank 529(g:529) is aborting with error code 1001. 529: Process ID: 53637, Host: r12i2n18, Program: /glade2/scratch2/jkshuman/Fire0504_Obrienh_Saldaa_Saldal_agb1zero_2PFT_1x1_2dba074_f8d7693/bld/cesm.exe 529: MPT Version: SGI MPT 2.15 12/18/16 02:58:06 529: 529:MPT: --------stack traceback------- 0: memory_write: model date = 60715 0 memory = 65749.16 MB (highwater) 102.04 MB (usage) (pe= 0 comps= ATM ESP) 529:MPT: Attaching to program: /proc/53637/exe, process 53637 529:MPT: done. 529:MPT: Try: zypper install -C "debuginfo(build-id)=3d290be00d48b823d3b71df2249e80d881bc473d" 529:MPT: (no debugging symbols found)...done. 529:MPT: Try: zypper install -C "debuginfo(build-id)=5409c48fdb15e90649c1407e444fbe31d6dc8ec1" 529:MPT: (no debugging symbols found)...done. 529:MPT: [Thread debugging using libthread_db enabled] 529:MPT: Using host libthread_db library "/glade/u/apps/ch/os/lib64/libthread_db.so.1". 529:MPT: Try: zypper install -C "debuginfo(build-id)=e97cfdb062d6f0c41073f2109a7605d0ae991c03" 529:MPT: (no debugging symbols found)...done. 529:MPT: Try: zypper install -C "debuginfo(build-id)=f43d7754940a14ffe3d9bd8fc9472ffbbfead544" 529:MPT: (no debugging symbols found)...done. 529:MPT: Try: zypper install -C "debuginfo(build-id)=0ea764119690f32c98faae9a63a73f35ed8b1099" 529:MPT: (no debugging symbols found)...done. 529:MPT: Try: zypper install -C "debuginfo(build-id)=15916519d9dbaea26ec88427460b4cedb9c0a6ab" 529:MPT: (no debugging symbols found)...done. 529:MPT: Try: zypper install -C "debuginfo(build-id)=79264652a62453da222372a430cd9351d4bbcbde" 529:MPT: (no debugging symbols found)...done. 529:MPT: Try: zypper install -C "debuginfo(build-id)=68682e9ac223d269cbecb94315fcec5e16b32bfb" 529:MPT: (no debugging symbols found)...done. 529:MPT: 0x00002aaaafac141c in waitpid () from /glade/u/apps/ch/os/lib64/libpthread.so.0 529:MPT: Missing separate debuginfos, use: zypper install glibc-debuginfo-2.19-35.1.x86_64 529:MPT: (gdb) #0 0x00002aaaafac141c in waitpid () 529:MPT: from /glade/u/apps/ch/os/lib64/libpthread.so.0 529:MPT: #1 0x00002aaab16215d6 in mpi_sgi_system ( 529:MPT: #2 MPI_SGI_stacktraceback ( 529:MPT: header=header@entry=0x7ffffffeeb70 "MPT ERROR: Rank 529(g:529) is aborting with error code 1001.\n\tProcess ID: 53637, Host: r12i2n18, Program: /glade2/scratch2/jkshuman/Fire0504_Obrienh_Saldaa_Saldal_agb1zero_2PFT_1x1_2dba074_f8d7693/bld"...) at sig.c:339 529:MPT: #3 0x00002aaab1574d6f in print_traceback (ecode=ecode@entry=1001) 529:MPT: at abort.c:227 529:MPT: #4 0x00002aaab1574fda in PMPI_Abort (comm=