Convergence issue in Distance, half-normal model large dataset

lenthomas commented 9 years ago

Distance v3.2.2 mrds v2.1.12 R3.2.0

With this dataset, which contains about 130,000 observations: https://www.dropbox.com/s/lkvylvsmxvm6el0/neartest.csv?dl=0

And this code:

library(Distance)
fincalls <- read.csv("neartest.csv")
whale.trunc <- 3000
halfnorm.calls <- ds(fincalls, truncation = whale.trunc, transect=c("point"), key="hn", adjustment="cos")

We get the error

Starting AIC adjustment term selection.
Fitting half-normal key function
Key only models do not require monotonicity contraints. Not constraining model for monotonicity.
** Warning: Problems with fitting model. Did not converge**
All models failed to fit!
Error in ds(fincalls, truncation = whale.trunc, transect = c("point"),  : 
 No models could be fitted.

dill commented 9 years ago

hmmm, odd.

In good news this works fine in Distance2:

library(Distance2)
fincalls <- read.csv("neartest.csv")
hnc <- ds(fincalls, truncation=3000, transect="point")

giving:

> summary(hnc)
Summary of fitted detection function
Transect type          : point
Number of observations : 125453
Distance range         : 0 - 3000
AIC                    : 1975286

Detection function     : Half-normal

Detection function parameters
            Estimate          SE
(Intercept) 6.900509 0.001409457

Kolmogorov-Smirnov p-value : 0
Cramer-von Mises p-value   : 0.2757482

           Estimate           SE          CV
Average p 0.2167464 0.0005816651 0.002683621

So, using the scale paramter estimate from the above of 6.900509 and giving that as the starting value to Distance:

hnc <- ds(fincalls, truncation = whale.trunc, transect=c("point"),
          key="hn", adjustment=NULL, initial.values=list(scale=6.900509))

gives a "reasonable" answer:

> summary(hnc)

Summary for distance analysis
Number of observations :  125453
Distance range         :  0  -  3000

Model : Half-normal key function
AIC   : 1975286

Detection function parameters
Scale Coefficients:
            estimate          se
(Intercept) 6.900509 0.001409456

                        Estimate           SE          CV
Average p           2.167464e-01 5.816832e-04 0.002683704
N in covered region 5.788009e+05 2.122367e+03 0.003666834

Summary statistics:
        Region         Area  CoveredArea Effort      n  k       ER    se.ER
1 nearest_2mon 455329872841 455329872841  16104 125453 11 7.790176 1.938827
     cv.ER
1 0.248881

Density:
  Label     Estimate           se        cv          lcl          ucl       df
1 Total 1.271168e-06 3.163881e-07 0.2488955 7.361599e-07 2.194997e-06 10.00233

From the fitted detection function plot it looks like maybe there is some room for left truncation and general improvements to the detection function?

screen shot 2015-09-20 at 11 29 44

Weirdly the Distance::ds optimisation code dies (fails to converge) when I give it an initial value of 6 (but not 7), though it will get to around 6.9005 -- so perhaps there's something weird going on there in the likelihood or the box constraints on the optimisation (in general I'm inclined to lose the box constraints on the parameters as I think they are not necessary, except in the hazard-rate shape case, but I'm not sure of the wider impact of this change).

Happy to plough some time into this but unsure what the cost/benefit trade-of would be...

lenthomas commented 9 years ago

Probably better to bank on moving to Distance2, for standard analyses, right?

DistanceDevelopment / distance-bugs

Convergence issue in Distance, half-normal model large dataset #154