NA/NaN gradient evaluation

Blevy2 commented 7 months ago

Hi Jim!

I am running into an issue that I am having trouble understanding that I wanted to run by you.

I am running VAST models on samples taken from spatial population simulation output for fish species that I have developed. In the spatial population models, the probability a fish moves to discrete cell $(i,j)$ in week $w$ is given by probability $Move{w,i,j}$, where $Move{w,i,j}$ depends on factors such as the water temperature in the given cell, $Temp_{w,i,j}$.

I want to compare the performance of VAST models with and without covariates, so I could reasonably provide VAST with either $Move{w,i,j}$ and/or $Temp{w,i,j}$ as the covariate. Since $Temp{w,i,j}$ is just one component of the actual movement probability $Move{w,i,j}$, my assumption is that $Move{w,i,j}$ would provide more information to VAST about species spatial preferences and thus provide a more accurate estimate compared to $Temp{w,i,j}$. I am using a second degree polynomial response when including covariates. For example, if using $Move$ as the covariate I input

        X2_formula =  ~ poly(Move, degree=2 ) 
        X1_formula =  ~ poly(Move, degree=2 )

My models without covariates are all converging fine and my models that use $Temp{w,i,j}$ as the covariate are also converging, but models that use $Move{w,i,j}$ as the covariate all seem to run nearly to completion before they produce the error

        <simpleError in nlminb(start = startpar, objective = fn, gradient = gr, control = nlminb.control,     lower = lower, upper = upper): NA/NaN gradient evaluation>

There are a few github issues for VAST that involve NA/NaN function evaluation, but not NA/NaN gradient evaluation. The closest thing I could find related to this is this issue thread in glmmTMB: https://github.com/glmmTMB/glmmTMB/issues/164

Based on the discussion in the above issue link our best guess is that maybe the gradient function created in VAST has a term something like log(exp(X)) where X is some parameter. During the final stage of a VAST run, possibly when the final Hessian is being calculated, the X value gets really really small (e.g., 1E-320) and the exp(X) goes to zero due to computer rounding and thus the log(exp(X)) is undefined. Does that sound reasonable?

Are you familiar with this error? Do you know how to fix this so we can use $Move$ as a covariate?

Here is my Seesion.Info():

attached base packages: [1] stats graphics grDevices utils datasets methods base

other attached packages: [1] dplyr_1.1.3 VAST_3.10.1 FishStatsUtils_2.12.1 [4] marginaleffects_0.15.1 units_0.8-4 TMB_1.9.6

loaded via a namespace (and not attached): [1] utf8_1.2.4 R6_2.5.1 tidyselect_1.2.0 Matrix_1.6-1.1
[5] lattice_0.20-44 magrittr_2.0.3 INLA_23.05.30-1 splines_4.3.2
[9] glue_1.6.2 tibble_3.2.1 pkgconfig_2.0.3 generics_0.1.2
[13] lifecycle_1.0.4 cli_3.6.1 fansi_1.0.5 vctrs_0.6.4
[17] grid_4.3.2 data.table_1.14.4 compiler_4.3.2 sp_1.5-0
[21] pillar_1.9.0 Rcpp_1.0.11 rlang_1.1.2
[1] "input is"

Thanks for your help!

Ben

James-Thorson commented 7 months ago

Ben,

I'm guessing that you have a mismatch between Matrix and TMB. Have you confirmed that TMB is working using the example here ( https://github.com/pfmc-assessments/geostatistical_delta-GLMM/wiki/Steps-to-install-TMB) ...?

Also, feel free to email me about tinyVAST ( https://vast-lib.github.io/tinyVAST/). I'm maintaining VAST for the next couple years, but my development efforts are focused on tinyVAST which has similar functionality using a smaller code base and regression interface.

Jim

On Wed, Jan 31, 2024 at 11:04 AM Blevy2 @.***> wrote:

Hi Jim!

I am running into an issue that I am having trouble understanding that I wanted to run by you.

I am running VAST models on samples taken from spatial population simulation output for fish species that I have developed. In the spatial population models, the probability a fish moves to discrete cell $(i,j)$ in week $w$ is given by probability $Move{w,i,j}$, where $Move{w,i,j}$ depends on factors such as the water temperature in the given cell, $Temp_{w,i,j}$.

I want to compare the performance of VAST models with and without covariates, so I could reasonably provide VAST with either $Move{w,i,j}$ and/or $Temp{w,i,j}$ as the covariate. Since $Temp{w,i,j}$ is just one component of the actual movement probability $Move{w,i,j}$, my assumption is that $Move{w,i,j}$ would provide more information to VAST about species spatial preferences and thus provide a more accurate estimate compared to $Temp{w,i,j}$. I am using a second degree polynomial response when including covariates. For example, if using $Move$ as the covariate I input
    X2_formula =  ~ poly(Move, degree=2 )
    X1_formula =  ~ poly(Move, degree=2 )
My models without covariates are all converging fine and my models that use $Temp{w,i,j}$ as the covariate are also converging, but models that use $Move{w,i,j}$ as the covariate all seem to run nearly to completion before they produce the error
    <simpleError in nlminb(start = startpar, objective = fn, gradient = gr, control = nlminb.control,     lower = lower, upper = upper): NA/NaN gradient evaluation>
There are a few github issues for VAST that involve NA/NaN function evaluation, but not NA/NaN gradient evaluation. The closest thing I could find related to this is this issue thread in glmmTMB: glmmTMB/glmmTMB#164 https://github.com/glmmTMB/glmmTMB/issues/164

Based on the discussion in the above issue link our best guess is that maybe the gradient function created in VAST has a term something like log(exp(X)) where X is some parameter. During the final stage of a VAST run, possibly when the final Hessian is being calculated, the X value gets really really small (e.g., 1E-320) and the exp(X) goes to zero due to computer rounding and thus the log(exp(X)) is undefined. Does that sound reasonable?

Are you familiar with this error? Do you know how to fix this so we can use $Move$ as a covariate?

Thanks for your help!

Ben

— Reply to this email directly, view it on GitHub https://github.com/James-Thorson-NOAA/VAST/issues/383, or unsubscribe https://github.com/notifications/unsubscribe-auth/AB46UTIAKL5M2WUXRTWHHQTYRKIVLAVCNFSM6AAAAABCTS7YBKVHI2DSMVQWIX3LMV43ASLTON2WKOZSGEYTANZXGU2DSNA . You are receiving this because you are subscribed to this thread.Message ID: @.***>

Blevy2 commented 7 months ago

Hi Jim,

I just ran the code in the install TMB link and confirmed that TMB is working.

Do you think I need a different version of the Matrix package?

My sessionInfo shows Matrix_1.6-1.1 and TMB_1.9.6

attached base packages: [1] stats graphics grDevices utils datasets methods base

other attached packages: [1] dplyr_1.1.3 VAST_3.10.1 FishStatsUtils_2.12.1 [4] marginaleffects_0.15.1 units_0.8-4 TMB_1.9.6

loaded via a namespace (and not attached): [1] utf8_1.2.4 R6_2.5.1 tidyselect_1.2.0 Matrix_1.6-1.1 [5] lattice_0.20-44 magrittr_2.0.3 INLA_23.05.30-1 splines_4.3.2 [9] glue_1.6.2 tibble_3.2.1 pkgconfig_2.0.3 generics_0.1.2 [13] lifecycle_1.0.4 cli_3.6.1 fansi_1.0.5 vctrs_0.6.4 [17] grid_4.3.2 data.table_1.14.4 compiler_4.3.2 sp_1.5-0 [21] pillar_1.9.0 Rcpp_1.0.11 rlang_1.1.2 [1] "input is"

Thanks for your help!

Ben

James-Thorson commented 7 months ago

When you start a new session and type library(TMB) do you get a warning message?

On Wed, Jan 31, 2024 at 11:23 AM Blevy2 @.***> wrote:

Hi Jim,

I just ran the code in the install TMB link and confirmed that TMB is working.

Do you think I need a different version of the Matrix package?

My sessionInfo shows Matrix_1.6-1.1 and TMB_1.9.6

attached base packages: [1] stats graphics grDevices utils datasets methods base

other attached packages: [1] dplyr_1.1.3 VAST_3.10.1 FishStatsUtils_2.12.1 [4] marginaleffects_0.15.1 units_0.8-4 TMB_1.9.6

loaded via a namespace (and not attached): [1] utf8_1.2.4 R6_2.5.1 tidyselect_1.2.0 Matrix_1.6-1.1 [5] lattice_0.20-44 magrittr_2.0.3 INLA_23.05.30-1 splines_4.3.2 [9] glue_1.6.2 tibble_3.2.1 pkgconfig_2.0.3 generics_0.1.2 [13] lifecycle_1.0.4 cli_3.6.1 fansi_1.0.5 vctrs_0.6.4 [17] grid_4.3.2 data.table_1.14.4 compiler_4.3.2 sp_1.5-0 [21] pillar_1.9.0 Rcpp_1.0.11 rlang_1.1.2 [1] "input is"

Thanks for your help!

Ben

— Reply to this email directly, view it on GitHub https://github.com/James-Thorson-NOAA/VAST/issues/383#issuecomment-1919778504, or unsubscribe https://github.com/notifications/unsubscribe-auth/AB46UTJMB2HDJAIVLW7BMZDYRKK2JAVCNFSM6AAAAABCTS7YBKVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSMJZG43TQNJQGQ . You are receiving this because you commented.Message ID: @.***>

Blevy2 commented 7 months ago

Hi Jim,

To clarify, this is being run in a High Performance Computing environment (HPC), which means I am running command line inputs that call R files and then looking at .out files.

I don't think there is a warning when library(TMB) is called, but I just searched the outfile for "warning" and found the following:

In addition: Warning messages: 1: In checkMatrixPackageVersion() : Package version inconsistency detected. TMB was built with Matrix version 1.5.4.1 Current Matrix version is 1.6.1.1 Please re-install 'TMB' from source using install.packages('TMB', type = 'source') or ask CRAN for a binary version of 'TMB' matching CRAN's 'Matrix' package 2: In dir.create(file.path(paste0(getwd(), "/sim_", sim_num, "/", CN, : '/mnt/research/b.levy/VAST_Stuff/VAST_MixFishSim/10xKnots_MFS/sim_1/YTF/ConTemp/WCov_MoveCov/FALL/AllStrata' already exists 3: In file(file, "rt") : cannot open file 'Index_wYearSeason.csv': No such file or directory

Do you agree that this implies I should have Matrix 1.5.4.1 instead of 1.6.1.1? I just want to be clear because this is a High Performance Computing environment so I need to communicate with the HPC administrator to coordinate package installations/changes.

If the Matrix package is off, any idea why I have never had a problem running models on the HPC previously? I have run thousands of models over the last few months on the HPC and only this one covariate type is causing an issue. I just want to try to understand the problem.

thanks Jim!

Ben

James-Thorson commented 7 months ago

There's a huge number of issue threads about the Matrix / TMB mismatch in glmmTMB, sdmTMB etc. It's been a headache :0 I'm guessing that TMB got updated and now the version mismatch is causing some problem with sparse-matrix stuff. Sorry that the HPC stuff is hard to debug! I'm hoping that this Matrix/TMB issue doesn't happen again.

On Wed, Jan 31, 2024 at 11:32 AM Blevy2 @.***> wrote:

Hi Jim,

To clarify, this is being run in a High Performance Computing environment (HPC), which means I am running command line inputs that call R files and then looking at .out files.

I don't think there is a warning when library(TMB) is called, but I just searched the outfile for "warning" and found the following:

In addition: Warning messages: 1: In checkMatrixPackageVersion() : Package version inconsistency detected. TMB was built with Matrix version 1.5.4.1 Current Matrix version is 1.6.1.1 Please re-install 'TMB' from source using install.packages('TMB', type = 'source') or ask CRAN for a binary version of 'TMB' matching CRAN's 'Matrix' package 2: In dir.create(file.path(paste0(getwd(), "/sim_", sim_num, "/", CN, : '/mnt/research/b.levy/VAST_Stuff/VAST_MixFishSim/10xKnots_MFS/sim_1/YTF/ConTemp/WCov_MoveCov/FALL/AllStrata' already exists 3: In file(file, "rt") : cannot open file 'Index_wYearSeason.csv': No such file or directory

Do you agree that this implies I should have Matrix 1.5.4.1 instead of 1.6.1.1? I just want to be clear because this is a High Performance Computing environment so I need to communicate with the HPC administrator to coordinate package installations/changes.

If the Matrix package is off, any idea why I have never had a problem running models on the HPC previously? I have run thousands of models over the last few months on the HPC and only this one covariate type is causing an issue. I just want to try to understand the problem.

thanks Jim!

Ben

— Reply to this email directly, view it on GitHub https://github.com/James-Thorson-NOAA/VAST/issues/383#issuecomment-1919793644, or unsubscribe https://github.com/notifications/unsubscribe-auth/AB46UTOU2KPQW2NRXUV3533YRKL6HAVCNFSM6AAAAABCTS7YBKVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSMJZG44TGNRUGQ . You are receiving this because you commented.Message ID: @.***>

Blevy2 commented 7 months ago

To follow up on this issue, I worked with the HPC administrator to try two different combinations of Matrix and TMB that were shown in some of the issues related to this problem. Unfortunately they both produced the same error.

Next I then tried a different covariate combination in the model and the model also did not converge. In this case I was using both a static ($Hab{i,j}$) and dynamic covariate ($Temp{wk,i,j}$) with the following response:

    X1_formula =  ~ poly(Temp, degree=2 ) + poly(Hab, degree=2 ) 
    X2_formula =  ~ poly(Temp, degree=2 )  + poly(Hab, degree=2 )

I then removed ($Hab_{i,j}$) from the X1_formula only and the model converged without an issue:

    X1_formula =  ~ poly(Temp, degree=2 ) 
    X2_formula =  ~ poly(Temp, degree=2 )  + poly(Hab, degree=2 )

There is no problem when no covariate information is included.

So the problem only shows up for specific combinations of covariate input.

Blevy2 commented 6 months ago

Hi @James-Thorson-NOAA ,

I think I am seeing something else related to this issue, which is why I am posting here. I may consider a new issue though as it could be something different. Do you think this problem is related to having a package incompatibility?

In some models with covariate I am getting the error:

I went ahead and printed out On_bounds and parameter_estimates at this step and see the following:

[1] "On_bounds is" ln_H_input ln_H_input beta1_ft beta1_ft beta1_ft beta1_ft NA NA NA NA NA NA beta1_ft beta1_ft beta1_ft beta1_ft beta1_ft beta1_ft NA NA NA NA NA NA beta1_ft beta1_ft beta1_ft beta1_ft beta1_ft beta1_ft NA NA NA NA NA NA beta1_ft beta1_ft beta1_ft beta1_ft gamma1_cp gamma1_cp NA NA NA NA NA NA L_omega1_z L_epsilon1_z logkappa1 beta2_ft beta2_ft beta2_ft NA NA NA NA NA NA beta2_ft beta2_ft beta2_ft beta2_ft beta2_ft beta2_ft NA NA NA NA NA NA beta2_ft beta2_ft beta2_ft beta2_ft beta2_ft beta2_ft NA NA NA NA NA NA beta2_ft beta2_ft beta2_ft beta2_ft beta2_ft gamma2_cp NA NA NA NA NA NA gamma2_cp gamma2_cp gamma2_cp L_omega2_z L_epsilon2_z logkappa2 NA NA NA NA NA NA logSigmaM NA [1] "parameter_estimates are" $par ln_H_input ln_H_input beta1_ft beta1_ft beta1_ft beta1_ft NaN NaN NaN NaN NaN NaN beta1_ft beta1_ft beta1_ft beta1_ft beta1_ft beta1_ft NaN NaN NaN NaN NaN NaN beta1_ft beta1_ft beta1_ft beta1_ft beta1_ft beta1_ft NaN NaN NaN NaN NaN NaN beta1_ft beta1_ft beta1_ft beta1_ft gamma1_cp gamma1_cp NaN NaN NaN NaN NaN NaN L_omega1_z L_epsilon1_z logkappa1 beta2_ft beta2_ft beta2_ft NaN NaN NaN NaN NaN NaN beta2_ft beta2_ft beta2_ft beta2_ft beta2_ft beta2_ft NaN NaN NaN NaN NaN NaN beta2_ft beta2_ft beta2_ft beta2_ft beta2_ft beta2_ft NaN NaN NaN NaN NaN NaN beta2_ft beta2_ft beta2_ft beta2_ft beta2_ft gamma2_cp NaN NaN NaN NaN NaN NaN gamma2_cp gamma2_cp gamma2_cp L_omega2_z L_epsilon2_z logkappa2 NaN NaN NaN NaN NaN NaN logSigmaM NaN

$objective [1] NaN

$iterations [1] 137

$evaluations function gradient 234 138

$time_for_MLE Time difference of 41.33874 mins

$max_gradient [1] NaN

$Convergence_check [1] NA

$number_of_coefficients Total Fixed Random 44281 55 44226

$AIC [1] NaN

$diagnostics Param starting_value Lower MLE Upper final_gradient 1 ln_H_input 0.0000000 -Inf NaN Inf NaN 2 ln_H_input 0.0000000 -Inf NaN Inf NaN 3 beta1_ft 0.0000000 -Inf NaN Inf NaN 4 beta1_ft 0.0000000 -Inf NaN Inf NaN 5 beta1_ft 0.0000000 -Inf NaN Inf NaN 6 beta1_ft 0.0000000 -Inf NaN Inf NaN 7 beta1_ft 0.0000000 -Inf NaN Inf NaN 8 beta1_ft 0.0000000 -Inf NaN Inf NaN 9 beta1_ft 0.0000000 -Inf NaN Inf NaN 10 beta1_ft 0.0000000 -Inf NaN Inf NaN 11 beta1_ft 0.0000000 -Inf NaN Inf NaN 12 beta1_ft 0.0000000 -Inf NaN Inf NaN 13 beta1_ft 0.0000000 -Inf NaN Inf NaN 14 beta1_ft 0.0000000 -Inf NaN Inf NaN 15 beta1_ft 0.0000000 -Inf NaN Inf NaN 16 beta1_ft 0.0000000 -Inf NaN Inf NaN 17 beta1_ft 0.0000000 -Inf NaN Inf NaN 18 beta1_ft 0.0000000 -Inf NaN Inf NaN 19 beta1_ft 0.0000000 -Inf NaN Inf NaN 20 beta1_ft 0.0000000 -Inf NaN Inf NaN 21 beta1_ft 0.0000000 -Inf NaN Inf NaN 22 beta1_ft 0.0000000 -Inf NaN Inf NaN 23 gamma1_cp 0.0000000 -Inf NaN Inf NaN 24 gamma1_cp 0.0000000 -Inf NaN Inf NaN 25 L_omega1_z 1.0000000 -Inf NaN Inf NaN 26 L_epsilon1_z 1.0000000 -Inf NaN Inf NaN 27 logkappa1 -0.1053605 -Inf NaN Inf NaN 28 beta2_ft 0.0000000 -Inf NaN Inf NaN 29 beta2_ft 0.0000000 -Inf NaN Inf NaN 30 beta2_ft 0.0000000 -Inf NaN Inf NaN 31 beta2_ft 0.0000000 -Inf NaN Inf NaN 32 beta2_ft 0.0000000 -Inf NaN Inf NaN 33 beta2_ft 0.0000000 -Inf NaN Inf NaN 34 beta2_ft 0.0000000 -Inf NaN Inf NaN 35 beta2_ft 0.0000000 -Inf NaN Inf NaN 36 beta2_ft 0.0000000 -Inf NaN Inf NaN 37 beta2_ft 0.0000000 -Inf NaN Inf NaN 38 beta2_ft 0.0000000 -Inf NaN Inf NaN 39 beta2_ft 0.0000000 -Inf NaN Inf NaN 40 beta2_ft 0.0000000 -Inf NaN Inf NaN 41 beta2_ft 0.0000000 -Inf NaN Inf NaN 42 beta2_ft 0.0000000 -Inf NaN Inf NaN 43 beta2_ft 0.0000000 -Inf NaN Inf NaN 44 beta2_ft 0.0000000 -Inf NaN Inf NaN 45 beta2_ft 0.0000000 -Inf NaN Inf NaN 46 beta2_ft 0.0000000 -Inf NaN Inf NaN 47 beta2_ft 0.0000000 -Inf NaN Inf NaN 48 gamma2_cp 0.0000000 -Inf NaN Inf NaN 49 gamma2_cp 0.0000000 -Inf NaN Inf NaN 50 gamma2_cp 0.0000000 -Inf NaN Inf NaN 51 gamma2_cp 0.0000000 -Inf NaN Inf NaN 52 L_omega2_z 1.0000000 -Inf NaN Inf NaN 53 L_epsilon2_z 1.0000000 -Inf NaN Inf NaN 54 logkappa2 -0.1053605 -Inf NaN Inf NaN 55 logSigmaM 1.6094379 -Inf NaN Inf NaN

James-Thorson-NOAA / VAST

NA/NaN gradient evaluation #383