Open MeganRSullivan opened 1 month ago
It looks like the error is in the DIC equation gradient w.r.t sigP in eqCcycle_v2.
So far, I can't find a problem in eqCcyle_v2.m
The DIC gradient equation for sigP is exactly the same as the equation for all the other P model parameters.
RHS(1:nwet,pindx.lsigP) = -d0((1-sigC-gamma)*RR*Gx(:,pindx.lsigP))*C2P;
Are there any terms in this equation that would be functions of sigP (but not of the other P model parameters)?
That leaves Gx as the only term that depends on sigP
Is it possible that Gx(:,pindx.lsigP)
has an error?
Gx(:,pindx.lsigP)
appears in the gradient for 6 of the 7 carbon tracers, but only the DIC gradient is very far from the complex step solution Gx(:,pindx.lsigP)
is wrong, it's strange that it wouldn't cause problems in any of the other tracer equations.
I'm still looking into it, but I'm getting pretty stumped. @Hojongseo , do you have any other ideas of where the problem might be?
Right, I agree with you. As we talked after the BGC seminar, the equation is exactly the same for all P parameters, so I also don't know why the problem only occurs with the gradient of sigP.
I checked the code for Gx if there was a error, but I think there is nothing wrong with Gx. Gx = GpDIPx + d0(DIP)Gpx In this equation, the second term of RHS is zero, except Gpx(:,ialpha) and Gpx(:,ibeta), so it does not seem to be related to sigP. In addition, since there is also no errors on calculating DIPx (results of the complex step method for eqPcycle was fine), it seems the calculation of Gx is correct. For the double check, I also see the DIPx, which means Px(1:nwet,:) in eqPcycle, and it looks fine to me.
Although I cannot find where is the problem, I'll continue to debug it and let you know if I find anything. Thank you for your efforts on finding an issue on sigP!
After looking into it further, I don't think the problem is in the sigP
equations after all. I ran a few GHtests of different parameter pairs (results below) and found the discrepancy between the explicit gradient and the complex step method occurred for whichever P model parameter was listed first in pindx
. So the error I was seeing in the sigP
gradient seems to occur due to its position in the vector of optimizable parameters, rather than an error in the gradient equations for sigP
.
Initially, I thought this might be caused by the way we pack the gradients in Cx
, but I couldn't find any issues with the ordering of the columns in Cx
.
I now suspect the issue is due to how we set the initial guess for the Cmodel solution in the global variable GC
.
Tunable parameters
-----------------------------------------------
current Q10P is 2.28e+00
current sigC is 9.00e-02
gradient relative error (1): -1.791e-03
-1.130e-02 4.472e-04
gradient relative error (2): -1.112e-14
2.871e-13 6.831e-15
Tunable parameters
-----------------------------------------------
current kdP is 1.86e-08
current alpha is 2.45e-08
current sigC is 9.00e-02
gradient relative error (1): -1.474e-03
-3.571e-03 -2.646e-02 4.932e-05
gradient relative error (2): 2.760e-12
4.937e-12 6.135e-12 2.159e-12
gradient relative error (3): -6.179e-16
2.775e-14 1.367e-13 9.657e-15
Tunable parameters
-----------------------------------------------
current alpha is 2.45e-08
current sigC is 9.00e-02
gradient relative error (1): -1.576e-03
-1.146e-02 2.049e-04
gradient relative error (2): -8.856e-15
4.220e-13 7.066e-15
Tunable parameters
-----------------------------------------------
current sigP is 3.50e-01
current bC is 6.50e-01
gradient relative error (1): -2.185e-03
2.228e-03 -1.742e-01
-- OUT_OF_MEMORY --
Restarting GHtest...
Tunable parameters
-----------------------------------------------
current sigP is 3.50e-01
current bC is 6.50e-01
gradient relative error (1): -1.746e-12
1.164e-13 1.694e-10
par.fname
already existed, so this output was used to update the initial guess stored in the global variable GC
.The error between the explicit gradient and the complex step method seems to only occur when the Newton-Armijo solver in eqCcycle does many iterations (e.g. stops at iter=18 as opposed to iter=1), suggesting that the problem results from the initial guess being far from the actual solution.
I think the problem may be related to resetting the global variable GC
after solving Ceqn
in eqCcycle
. Resetting GC
to the steady-state solution could explain why the error only occurs in the first parameter in x
. For each of these tests, the first iteration reads in the initial fields from the initCO
file from DATA/BGC_2023Nature
. Because each test has a unique name, no output file (fname
) exists yet, so the initial guesses do not get updated using par.fname
.
When running neglogpost
for the first parameter in the GHtest loop, GC
is not very close to the Cmodel steady-state solution, so the Newton-Armijo solver in eqCcycle
took 18 iterations to find a solution. Then the global variable GC
is reset to this new solution. In every subsequent iteration through neglogpost
, the global variable is very close to the actual solution. Thus, for the rest of the parameters tested with GHtest, the initial value of GC
is already at the steady-state solution, and the Newton-Armijo solver in eqCcycle
finds a solution after just one iteration.
I'm not entirely sure why having the initial guess for GC
far from the solution would cause the explicit gradient to differ from the complex step method. Maybe the explicit gradients depend on the DIC values in the initial global variable GC
, while the complex step method uses the updated GC
to compute the gradient. I can look into this further later.
The gradient equation for sigP
seems correct, but the gradients only match the complex step method when the initial guess is near the steady-state solution. We should be able to avoid this issue by solving the steady state model once before running the GHtest.
I don't know if this impacts the fminunc
function's ability to minimize the objective function using the computed gradient and Hessian.
Let me know if you have any ideas.
FYI, I found a possible issue with the
sigP
parameter gradient while testing the gradient and Hessian for the 24layer model with P, C, and O models on.The Problem: The explicitly computed gradient w.r.t.
sigP
does not match the complex step method. But in the Hessian, when only the 2nd derivative is w.r.t.sigP
, the Hessian is consistent with the complex step method. This suggests that the problem is probably in the gradient definition forsigP
.I tested this using my version of the code in the
Code_withCellModel
folder, but the issue occurred even when the cell model is set to off andrO2C
is a constant parameter (in which case, the code should operate exactly the same asCode_24layer
)Steps Taken So far, I have tried to diagnose where the error is introduced by testing if the problem is in eqPcycle, eqOcycle, or eqCcyle
opt_sigP
turned on, and the Cmodel and Omodel off, to see if the problem is in eqPcycle.opt_sigP
andopt_sigC
turned on (Cmodel = on; Omodel = off)sigP
gradient did not match complex step methodMy conclusion so far is that there is a bug in
eqCcycle_v2.m
in the gradient with respect tosigP
. I have not identified where exactly the issue is yet, but thought I would post it here, in case anyone else has encountered or fixed this issue in other versions of the BGC model.