Open SeanBryan51 opened 2 months ago
@rkutteh @ccarouge FYI this issue looks like it is related to the GW work.
Currently all ssnow%*_hys
variables are uninitialised causing the exception. It looks like initialisation of some ssnow%*_hys
variables occur in the subroutine GWspatialParameters
here:
Note: GWspatialParameters
does not seem to initialise the ssnow%sucs_hys
or ssnow%wb_hys
variables.
For the next GW changes, are there plans to remove the problematic code, i.e:
or ensure all ssnow%*_hys
variables are initialised?
@SeanBryan51 @ccarouge As Claire already knows, I have fixed all these bugs in my GW branch that is now in the process of making its way into the trunk. My own view is to wait a bit until this process is finished (this month I think) so as to avoid reinventing the wheel. Just for the record, I had compiled my GW branch with "check all" and fixed every bug it flagged.
@rkutteh -check
and -ftrapuv
are not 100% reliable in finding uninitialised vars (see this talk for more info). Runtime memory checking tools are more robust. I have been using ddt
with memory debug settings enabled which I recommend. It is easy to run CABLE with ddt
on Gadi using offline debugging:
module load linaro-forge/24.0.2
ddt --offline --mem-debug=balanced mpiexec -n <NCPUS> ./cable-mpi
Happy to share more details if you are interested
Hacking a temporary fix for https://github.com/CABLE-LSM/CABLE/issues/395 and running CABLE-MPI offline (
main
branch - commit 95b9b5e915581b0d0ed0ed407573cb448770c7b4) using the crujra_accessN96_1h configuration results in the following divide by zero exception:The exception occurs on this line of the code:
https://github.com/CABLE-LSM/CABLE/blob/95b9b5e915581b0d0ed0ed407573cb448770c7b4/src/offline/cable_parameters.F90#L2320
It looks like
ssnow%ssat_hys(i,k)
andssnow%watr_hys(i,k)
are both uninitialised and contain the same garbage value, causing the subtraction of the two values to result in divide by zero.Steps to reproduce (Gadi)
Apply the following patch to fix the error described in https://github.com/CABLE-LSM/CABLE/issues/395 (WARNING - this patch is untested and should not be used for work other than reproducing this issue):
The steps to reproduce the error are the same as that described in https://github.com/CABLE-LSM/CABLE/issues/395.