gadget-framework / gadget3

TMB-based gadget implemtation
GNU General Public License v2.0
8 stars 5 forks source link

NaN in the gradient #20

Closed bthe closed 3 years ago

bthe commented 3 years ago

The latest commit of the ling model (see https://github.com/gadget-framework/gadget3/commit/493cb88b3ccce0272c11b2cc3e7efb7f4f86bdb0 ) improves the model quite substantially. However, the minimizer stops with NaN's in the gradient. Looking at the components of the gradient it seems that it is almost all parameters that report NaN:

)
outer mgc:  NaN 
       ling__Linf           ling__k        ling__recl      ling__scalar     ling__init__3     ling__init__4     ling__init__5     ling__init__6     ling__init__7 
              NaN               NaN               NaN               NaN               NaN               NaN               NaN               NaN               NaN 
    ling__init__8     ling__init__9    ling__init__10    ling__init__11    ling__init__12    ling__init__13    ling__init__14    ling__init__15 ling__renew__1982 
              NaN               NaN               NaN               NaN               NaN               NaN               NaN               NaN      0.000000e+00 
ling__renew__1983 ling__renew__1984 ling__renew__1985 ling__renew__1986 ling__renew__1987 ling__renew__1988 ling__renew__1989 ling__renew__1990 ling__renew__1991 
              NaN               NaN               NaN               NaN               NaN               NaN               NaN               NaN               NaN 
ling__renew__1992 ling__renew__1993 ling__renew__1994 ling__renew__1995 ling__renew__1996 ling__renew__1997 ling__renew__1998 ling__renew__1999 ling__renew__2000 
              NaN               NaN               NaN               NaN               NaN               NaN               NaN               NaN               NaN 
ling__renew__2001 ling__renew__2002 ling__renew__2003 ling__renew__2004 ling__renew__2005 ling__renew__2006 ling__renew__2007 ling__renew__2008 ling__renew__2009 
              NaN               NaN               NaN               NaN               NaN               NaN               NaN               NaN               NaN 
ling__renew__2010 ling__renew__2011 ling__renew__2012 ling__renew__2013 ling__renew__2014 ling__renew__2015 ling__renew__2016 ling__renew__2017 ling__renew__2018 
              NaN               NaN               NaN               NaN               NaN               NaN               NaN               NaN               NaN 
ling__renew__2019 ling__renew__2020 ling__renew__2021     ling__init__F      ling__mat__a    ling__mat__a50  ling__bmt__alpha    ling__bmt__l50  ling__lln__alpha 
    -1.284182e-02      3.090614e-02      1.637511e-17               NaN               NaN               NaN               NaN               NaN               NaN 
   ling__lln__l50  ling__gil__alpha    ling__gil__l50 ling__igfs__alpha   ling__igfs__l50        ling__bbin        ling__mat1        ling__mat2     ling__rec__sd 
              NaN               NaN               NaN               NaN               NaN               NaN               NaN               NaN               NaN 
   ling_si_alpha1     ling_si_beta1    ling_si_alpha2     ling_si_beta2    ling_si_alpha3    ling_si_alpha4    ling_si_alpha5    ling_si_alpha6    ling_si_alpha7 
    -2.272262e-01     -3.075031e+00      1.456065e-01      1.602682e+00     -6.544071e-01      8.044611e-01     -2.869935e-01      1.336200e-01      3.496039e-01 

Is there any way to trace where these NaNs come from?

lentinj commented 3 years ago

Hrm, not really, the gradient function has just worked(tm) so far. I think it's possible to thread through trace statements to trace what's happening in CppAD land, but I've not seen any more useful suggestions than that.

I'll have a go here and see if I can spot anything.

lentinj commented 3 years ago

It converged fine here, although took a lot longer than it has done:

> fit.opt
$par
       ling__Linf           ling__k        ling__recl      ling__scalar     ling__init__3     ling__init__4     ling__init__5     ling__init__6
     1.625984e-01     -1.012580e+00      7.441820e+00      2.937539e+00     -3.640827e-01     -6.219107e+00      4.977351e+00      4.105970e+00
    ling__init__7     ling__init__8     ling__init__9    ling__init__10    ling__init__11    ling__init__12    ling__init__13    ling__init__14
     2.483280e+00      5.614337e+00     -2.504437e+00      6.518533e+00     -3.002710e+00     -3.187114e+00     -1.522392e-01      4.608031e+00
   ling__init__15 ling__renew__1982 ling__renew__1983 ling__renew__1984 ling__renew__1985 ling__renew__1986 ling__renew__1987 ling__renew__1988
     6.799556e+00      0.000000e+00      1.336624e+00      2.834987e-01      5.673539e+00      9.203406e-01      1.874469e+00      1.257624e+00
ling__renew__1989 ling__renew__1990 ling__renew__1991 ling__renew__1992 ling__renew__1993 ling__renew__1994 ling__renew__1995 ling__renew__1996
     7.152697e-01      7.812653e-01      7.542524e+00      3.153639e+00      1.222294e+00      3.035459e+00      1.036628e+00      7.272965e+00
ling__renew__1997 ling__renew__1998 ling__renew__1999 ling__renew__2000 ling__renew__2001 ling__renew__2002 ling__renew__2003 ling__renew__2004
     9.047961e-01      4.759432e+00      2.244058e+00      1.162745e+00      4.731725e+00     -1.558000e+00      9.057847e-01      8.129590e-01
ling__renew__2005 ling__renew__2006 ling__renew__2007 ling__renew__2008 ling__renew__2009 ling__renew__2010 ling__renew__2011 ling__renew__2012
    -1.276563e+01     -3.146066e-01     -8.097236e-01     -8.334384e+00     -9.007742e+00     -7.852095e+00     -1.830429e+00      8.862025e-01
ling__renew__2013 ling__renew__2014 ling__renew__2015 ling__renew__2016 ling__renew__2017 ling__renew__2018 ling__renew__2019 ling__renew__2020
     4.953740e+00      2.788213e+00      1.218599e+00      8.578491e-01      1.156233e+00      4.668392e-01      1.230170e+00      9.271074e+00
ling__renew__2021     ling__init__F      ling__mat__a    ling__mat__a50  ling__bmt__alpha    ling__bmt__l50  ling__lln__alpha    ling__lln__l50
     2.326683e-11      2.043414e+01      5.207041e+00      1.328988e+00      1.227826e+00      9.338923e-02      1.553244e+00     -2.616228e-02
 ling__gil__alpha    ling__gil__l50 ling__igfs__alpha   ling__igfs__l50        ling__bbin        ling__mat1        ling__mat2     ling__rec__sd
     1.494448e+00     -1.494075e+00      2.110662e+00     -1.005100e+01      1.368498e+01      3.953200e+00      3.702272e+00     -6.575033e+00
   ling_si_alpha1     ling_si_beta1    ling_si_alpha2     ling_si_beta2    ling_si_alpha3    ling_si_alpha4    ling_si_alpha5    ling_si_alpha6
    -1.316775e+01      1.064836e+00     -1.119104e+01      1.014692e+00     -1.024074e+01     -9.815638e+00     -9.674589e+00     -9.674436e+00
   ling_si_alpha7
    -1.045436e+01

$value
[1] 38.74629

$counts
function gradient
     899      262

$convergence
[1] 0

$message
NULL
bthe commented 3 years ago

Well this gives me a clue, I had been messing around with the weight to the understocking likelihood, change that I didn't think would matter much. I'll try reverting the weight to see what happens.

bthe commented 3 years ago

Yes, this seems to have done the trick. If you want to replicate this error, you can set the understocking weight to 100.

lentinj commented 3 years ago

Hrm, it's still converging. But I guess it's not worth worrying about too much for now.

bthe commented 3 years ago

Yes, let's wait until something else crops up.