Closed jjokella closed 2 months ago
@jjokella: I think the issue here is that min(num_inst_lnd,inst_index)
returns 0
which shouldn't be possible. Let's look closer into these two variables:
num_inst_lnd
: Number of land model instances in MPI_COMM_WORLD
. Clearly this should be 1
.
https://github.com/HPSCTerrSys/eCLM/blob/b88252465bd67f1392cd5c8adf97fa91a803c405/src/csm_share/mct/seq_comm_mct.F90#L67
https://github.com/HPSCTerrSys/eCLM/blob/b88252465bd67f1392cd5c8adf97fa91a803c405/src/csm_share/CMakeLists.txt#L15
inst_index
: Index of the current land model instance. If num_inst_lnd=1
, then we expect inst_index=1
. Here's how it's being set:
https://github.com/HPSCTerrSys/eCLM/blob/b88252465bd67f1392cd5c8adf97fa91a803c405/src/clm5/cpl/lnd_comp_mct.F90#L142
https://github.com/HPSCTerrSys/eCLM/blob/b88252465bd67f1392cd5c8adf97fa91a803c405/src/csm_share/mct/seq_comm_mct.F90#L1336-L1348
I suspect seq_comm_inst(LNDID)
returns 0
. Based from the code above, this happens when LNDID
is outside of the range [1, ncomps]
. This could be a good place to probe further. Also I'd suggest to verify if our assumptions above match with a running model.
Hey @kvrigor , thanks for looking into the issue!
Also I'd suggest to verify if our assumptions above match with a running model.
The debug output above in lnd_0000.log
is coming from a running eCLM-PDAF model compiled as RELEASE
. Only when it is compiled as DEBUG
, will the error show.
Regarding LNDID
, I think this should be the suffix of the log and our suffixes start at 0
. They are also given out in the atmospheric log atm_0000.log
:
(datm_comp_init) decomp = 1d
(datm_comp_init) iradsw = 1
(datm_comp_init) factorFn = null
(datm_comp_init) restfilm = undefined
(datm_comp_init) restfils = undefined
(datm_comp_init) presaero = T
(datm_comp_init) force_prognostic_true = F
(datm_comp_init) wiso_datm = F
(datm_comp_init) inst_index = 0
(datm_comp_init) inst_name = ATM0000
(datm_comp_init) inst_suffix = _0000
Maybe it is just not correct to start the instances with 0
? But it is such a small place, where this leads to an error, everything else runs fine with inst_index = 0
.
I am checking now debug output of LNDID
and inst_index
around the line that sets the inst_index
https://github.com/HPSCTerrSys/eCLM/blob/b88252465bd67f1392cd5c8adf97fa91a803c405/src/clm5/cpl/lnd_comp_mct.F90#L142
Based from old logs (1, 2), it seems to be that inst_index=1
:
(datm_comp_init) decomp = 1d
(datm_comp_init) iradsw = 1
(datm_comp_init) factorFn = null
(datm_comp_init) restfilm = undefined
(datm_comp_init) restfils = undefined
(datm_comp_init) presaero = T
(datm_comp_init) force_prognostic_true = F
(datm_comp_init) wiso_datm = F
(datm_comp_init) inst_index = 1
(datm_comp_init) inst_name = ATM
(datm_comp_init) inst_suffix =
Could inst_index
be possibly inherited from pdaf_id
here?
If so, one fix could be to re-index pdaf_id
starting from 1
to conform to Fortran 1-based arrays. Another approach is to increment inst_index
only within the context of USE_PDAF
; however this approach might be prone to other side effects.
Ah yes, this is it, thanks @kvrigor !!
Now, I have to think about what to do. I think this should be fixed, such that pdaf_id
starts at 1
as you suggest.
Only downside is that this is not backward-compatible with existing eCLM-PDAF inputs (because of the change of suffixes).
Closing this issue here, as it should be fixed in the PDAF-interface of eCLM-PDAF.
I will currently not implement a hard-coded way of forcing pdaf_id
to start at 1
, as this breaks existing eCLM-PDAF setups by indexing. This issue becomes even bigger, when looking at CLM3.5-PDAF and coupled ParFlow-CLM-PDAF. In the future, the indexing may be refactored to start at 1
by default for all of these cases.
Currently, my approach is to add documentation and suggest users to let pdaf_id
start at 1
using the existing input DA:startreal
in enkpf.par
(https://hpscterrsys.github.io/TSMP/content/setup_tsmp/input_enkfpf.html#da-startreal).
Compiling eCLM-PDAF in TSMP2 with the option
-DCMAKE_BUILD_TYPE="DEBUG"
generates the following error (in at least two testcases for eCLM-PDAF):Here is the code:
https://github.com/HPSCTerrSys/eCLM/blob/b88252465bd67f1392cd5c8adf97fa91a803c405/src/clm5/cpl/lnd_comp_mct.F90#L701-L734
Current idea:
lnd_resume
is an array that should start with index1
. However,inst_index
starts at0
. Therefore, the debug-compiled version throws an error.In the RELEASE version, the code runs through. However, it leads to strange behavior:
resume_from_DA T
will be output at every time step forinst_index = 0
.Adding some debug output printed the following in
lnd_0000.log
:This indicates to my knowledge that
lnd_resume(0)
is interpreted as a dummy string and therefore every time step is falsely set as an "after-DA" time step.@kvrigor I wanted to ask your help here. I am thinking about introducing a quick fix by using
inst_index + 1
. Do you think this is something to take to CLM5 even? In newer versions,lnd_resume
does not appear anymore, but inrelease_clm5
, this "error" exists.