James-Thorson-NOAA / VAST

Spatio-temporal analysis of univariate or multivariate data, e.g., standardizing data for multiple species or stages
http://www.FishStats.org
GNU General Public License v3.0
124 stars 54 forks source link

fit_model estimation error for some species #209

Closed afredston closed 4 years ago

afredston commented 4 years ago

I'm running VAST single-species models on a list of species from the Eastern Bering Sea. I downloaded the 100 most frequently observed taxa using FishData::download_catch_rates, of which 82 were identified to species. Of those 82, VAST fails to fit 28, and at least the first 5 reach the stage of estimating range shifts and then report the same error:

Calculating range shift for stratum #1:

### Making TMB object
make: Nothing to be done for `all'.
Warning: 4 external pointers will be removed
Constructing atomic logspace_add
Constructing atomic logspace_add
List of estimated fixed and random effects:
   Coefficient_name Number_of_coefficients   Type
1      L_epsilon1_z                      1  Fixed
2      L_epsilon2_z                      1  Fixed
3         logkappa1                      1  Fixed
4         logkappa2                      1  Fixed
5         logSigmaM                      1  Fixed
6          beta1_ft                     37 Random
7          beta2_ft                     37 Random
8 Epsiloninput1_sft                   4292 Random
9 Epsiloninput2_sft                   4292 Random

### Estimating parameters
Error in nlminb(start = startpar, objective = fn, gradient = gr, control = nlminb.control,  : 
  gradient function must return a numeric vector of length 5
In addition: Warning message:
In nlminb(start = startpar, objective = fn, gradient = gr, control = nlminb.control,  :
  NA/NaN function evaluation

The list of species that doesn't converge is: c("Bathyraja_parmifera", "Lepidopsetta_polyxystra", "Atheresthes_evermanni", "Leptasterias_polaris", "Pagurus_trigonocheirus", "Labidochirus_splendescens", "Chrysaora_melanaster", "Buccinum_scalariforme", "Pagurus_aleuticus", "Hippoglossoides_robustus", "Styela_rustica", "Pagurus_capillatus", "Pagurus_ochotensis", "Buccinum_angulosum", "Boltenia_ovifera", "Lethasterias_nanimensis", "Pagurus_rathbuni", "Leptasterias_arctica", "Buccinum_polare", "Eunoe_nodosa", "Pagurus_confragosus", "Ophiura_sarsii", "Bathyraja_interrupta", "Pteraster_obscurus", "Boreogadus_saida", "Myoxocephalus_scorpius", "Mactromeris_polynyma", "Eunoe_depressa" )

James-Thorson-NOAA commented 4 years ago

Thanks Alexa for posting here!

I don't have any intuition based on this output, but cold you make a minimal example using the format here (https://github.com/nwfsc-assess/geostatistical_delta-GLMM/wiki/How-to-create-a-minimal-example) for one of those five species and I could take a look and report back on this thread?

afredston commented 4 years ago

Sure! This minimal example still creates the same error for me:

#unlink(paste0(getwd(),"/Kmeans-100.Rdata")) # deletes Kmeans-100 from other region models, if switching regions
library(VAST)
library(TMB)
Version = get_latest_version( package="VAST" )

Method = c("Grid", "Mesh", "Spherical_mesh")[2]
grid_size_km = 50
n_x = 100   

FieldConfig = c("Omega1"=0, "Epsilon1"=1, "Omega2"=0, "Epsilon2"=1) 

RhoConfig = c("Beta1"=0, "Beta2"=0, "Epsilon1"=2, "Epsilon2"=2) 
OverdispersionConfig = c("Eta1"=0, "Eta2"=0)
ObsModel = c(1,3)

Options =  c("SD_site_density"=0, "SD_site_logdensity"=0, "Calculate_Range"=1, "Calculate_evenness"=0, "Calculate_effective_area"=1, "Calculate_Cov_SE"=0, 'Calculate_Synchrony'=0, 'Calculate_Coherence'=0)

strata.limits <- data.frame('STRATA'="All_areas")

Region = "Eastern_Bering_Sea"
Species_set = 100 

DateFile = paste0(getwd(),'/VAST_output/') 
if (!dir.exists(DateFile)){
  dir.create(DateFile)
} else {
  print(paste0(DateFile, " already exists"))
}

Catch_rates <- FishData::download_catch_rates(survey="Eastern_Bering_Sea", species_set=Species_set) 

Extrapolation_List = make_extrapolation_info( Region=Region, strata.limits=strata.limits )

settings = make_settings( "n_x"=n_x, "Region"=Region, purpose="index", bias.correct=FALSE, use_anisotropy=FALSE, "FieldConfig"=FieldConfig, "RhoConfig"=RhoConfig, "OverdispersionConfig"=OverdispersionConfig,"Options"=Options,"ObsModel"=ObsModel  )

DF <- Catch_rates[Catch_rates$Sci=="Bathyraja_parmifera",]
Data_Geostat = data.frame( "spp"=DF[,"Sci"], "Year"=DF[,"Year"], "Catch_KG"=DF[,"Wt"], "AreaSwept_km2"=0.01, "Vessel"=0, "Lat"=DF[,"Lat"], "Lon"=DF[,"Long"] )

# get rid of species observed everywhere to prevent VAST errors 
Data_Geostat$nonzero = ifelse(Data_Geostat$Catch_KG > 0, "Y","N")
Data_Geostat$prop_nonzero = ave(Data_Geostat$nonzero, Data_Geostat$Year, FUN=function(x) mean(x=='Y'))
Data_Geostat <- Data_Geostat[Data_Geostat$prop_nonzero<1,]

Spatial_List = make_spatial_info( grid_size_km=grid_size_km, n_x=n_x, Method=Method, Lon_i=Data_Geostat[,'Lon'], Lat_i=Data_Geostat[,'Lat'], Extrapolation_List=Extrapolation_List, DirPath=DateFile, Save_Results=FALSE )

fit = fit_model("settings"=settings, "Lat_i"=Data_Geostat[,'Lat'],
                "Lon_i"=Data_Geostat[,'Lon'], "t_i"=Data_Geostat[,'Year'],
                "c_i"=rep(0,nrow(Data_Geostat)), "b_i"=Data_Geostat[,'Catch_KG'],
                "a_i"=Data_Geostat[,'AreaSwept_km2'], "v_i"=Data_Geostat[,'Vessel'],
                "getReportCovariance"=FALSE, "getJointPrecision"=TRUE, 
                lower=-Inf, upper=Inf, 
                test_fit = FALSE, 
                fine_scale=FALSE,
                anisotropy=FALSE,
                Use_REML=TRUE)
James-Thorson-NOAA commented 4 years ago

OK, "Bathyraja_parmifera" has 0% encounters for many years, which results in a non-identifable model when intercepts are treated as fixed effects. You're using REML=TRUE, which means the intercepts are optimized during the inner loop -- the beta1 for years with 0% encounters go to -Inf, which then results in a non-invertible inner-loop Hessian. I found this when switching to REML=FALSE and seeing the parameter estimates for beta1s were going towards -Inf (i.e., < -20).

A few suggestions:

  1. Please check for species with 0% encounters in any years and either (1A) discard them, or (1B) specify a temporal structure on intercepts for these species.
  2. Please explore other non-convergences for similar issues, including changing REML=FALSE and checking MLEs for apparent issues.

I note that make_data(.) should be doing an auto-check for simple issues like this, and I'm not sure why it's not throwing an informative error in this instance. I'll try to remember to check this later, and feel free to remind me in a few weeks.

I'm closing the issue for now, but feel free to keep posting here if other species aren't fixable using suggestions 1-2 above.

James-Thorson-NOAA commented 4 years ago

PS: I've just added this as an informative error in make_data(.) in the VAST development branch, to clearly indicate this issue in the future

afredston commented 4 years ago

For future reference, here's how I addressed this issue:

I already had code in the single-species models to drop years when a species was encountered 100% of the time, because VAST can't use those to estimate an encounter rate. So I just added a line to also drop years when a species was encountered 0% of the time:

  # below I'm getting rid of years where the species was observed 100% of the time 
  Data_Geostat$nonzero = ifelse(Data_Geostat$Catch_KG > 0, "Y","N")

  Data_Geostat$prop_nonzero = ave(Data_Geostat$nonzero, Data_Geostat$Year, FUN=function(x) mean(x=='Y'))

  Data_Geostat <- Data_Geostat[Data_Geostat$prop_nonzero<1,]
  Data_Geostat <- Data_Geostat[Data_Geostat$prop_nonzero>0,] # also drop years where species was never observed

However, VAST still estimates ranges in years with no data. At the moment I'm doing a post-hoc comparison to drop years from the VAST output when I know there were no species observations (which is not necessary for all purposes but I think is appropriate for our questions), but if there's a more functional way to tell VAST to just skip those years entirely, let me know!