DistanceDevelopment / dsm

Density surface modelling for distance sampling.
http://distancesampling.org/R/
GNU General Public License v3.0
8 stars 9 forks source link

Variance results contrast #29

Closed erex closed 7 years ago

erex commented 7 years ago

D7 testing continues, now that @dill made correction to dsm response types for dsm.var.prop()

Compare var.gam() and var.prop() results of model dsm_nb_xy_ms:

var.gam

Summary of uncertainty in a density surface model calculated
 analytically for GAM, with delta method

Approximate asymptotic confidence interval:
    2.5%     Mean    97.5% 
1198.310 1710.336 2441.146 
(Using log-Normal approximation)

Point estimate                 : 1710.336 
CV of detection function       : 0.06670757 
CV from GAM                    : 0.1704 
Total standard error           : 313.0395 
Total coefficient of variation : 0.183

var.prop

Summary of uncertainty in a density surface model calculated
 by variance propagation.

Quantiles of differences between fitted model and variance model
    Min.  1st Qu.   Median     Mean  3rd Qu.     Max. 
-1.31000 -0.04789  0.17090  0.01322  0.23880  0.27260 

Approximate asymptotic confidence interval:
        2.5%         Mean        97.5% 
    29.57459   2015.81785 137399.06899 
(Using log-Normal approximation)

Point estimate                 : 2015.818 
Standard error                 : 20412.4 
Coefficient of variation       : 10.1261

speculate that var.prop is wrong

D7 log window content

> var.dat<-read.table(file='C:\\Users\\eric\\AppData\\Local\\Temp\\dst39686\\var.dat.r', header=TRUE, sep='\t', comment.char='')
> sink(file='C:\\Users\\eric\\AppData\\Local\\Temp\\dst39686\\res.r',append=T)
> cat('\tResponse Surface/Variance: Variance propagation method\t\n')
> dsm.tmp<-dsm.4
> dsm.tmp$data<-merge(dsm.tmp$data,var.dat)
> dsm.var.13<-dsm.var.prop(dsm.obj = dsm.tmp, pred.data = grid.5, off.set=100000000)
> rm(var.dat, dsm.tmp)
> summary(dsm.var.13, alpha = 1-0.95)
dill commented 7 years ago

I can reproduce this on my machine and am a little concerned...

potential jeglag-based reasoning, but... Re-running the DSM with Tweedie response (family=tw()) yields a much more reasonable estimate:

Summary of uncertainty in a density surface model calculated
 by variance propagation.

Quantiles of differences between fitted model and variance model
      Min.    1st Qu.     Median       Mean    3rd Qu.       Max.
-4.842e-06  2.211e-09  2.425e-08 -9.620e-10  9.030e-08  9.710e-07

Approximate asymptotic confidence interval:
    2.5%     Mean    97.5%
1614.304 2599.384 4185.579
(Using log-Normal approximation)

Point estimate                 : 2599.384
Standard error                 : 641.2283
Coefficient of variation       : 0.2467

I've forgotten if there is some issue here with scale parameters etc for the negative binomial but will think about it more in the morning...

dill commented 7 years ago

Okay, so looks v. likely this is a negative binomial issue...

library(Distance)
library(dsm)
data(mexdolphins)

hr.model <- ds(distdata, max(distdata$distance),
                     key = "hr", adjustment = NULL)
mod1 <- dsm(count~s(x, y), hr.model, segdata, obsdata, family=nb())

mod1.var <- dsm.var.prop(mod1, preddata, off.set=preddata$area)

Comparing the model that dsm.var.prop refits vs. the one it ate, respectively:

> summary(mod1.var$model)

Family: Negative Binomial(684772364.109)
Link function: log

Formula:
count ~ s(x, y) + XX + offset(off.set)

Parametric coefficients:
              Estimate Std. Error z value Pr(>|z|)
(Intercept) -1.664e+01  2.331e+00  -7.138 9.45e-13 ***
XX1          0.000e+00  0.000e+00      NA       NA
XX2          2.608e-16  4.809e-01   0.000        1
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Approximate significance of smooth terms:
       edf Ref.df Chi.sq p-value
s(x,y)   2      2  0.003   0.998

Rank: 31/32
R-sq.(adj) =  -0.00153   Deviance explained = 2.26%
-REML = 113.47  Scale est. = 24474     n = 387
> summary(mod1.var$dsm.object)

Family: Negative Binomial(0.017)
Link function: log

Formula:
count ~ s(x, y) + offset(off.set)

Parametric coefficients:
            Estimate Std. Error z value Pr(>|z|)
(Intercept) -16.7922     0.3923   -42.8   <2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Approximate significance of smooth terms:
         edf Ref.df Chi.sq p-value
s(x,y) 2.555  3.045  1.303    0.72

R-sq.(adj) =  0.00236   Deviance explained = 3.02%
-REML = 382.62  Scale est. = 1         n = 387

Doesn't seem to be a problem for quasi-Poisson or Tweedie...

erex commented 7 years ago

OK; lots of unhappiness on the nb() front

dill commented 7 years ago

I'll look into this when I get a minute, but this week is out, I'm afraid.

On 28/03/2017 08:46, erex wrote:

OK; lots of unhappiness on the nb() front

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/DistanceDevelopment/dsm/issues/29#issuecomment-289758177, or mute the thread https://github.com/notifications/unsubscribe-auth/AAAHoacr4twPodBgcDkwSduSd2hHOcqQks5rqQEtgaJpZM4Mp6yo.

erex commented 7 years ago

that's fine. It doesn't appear to be a DistWin issue, so it can rest for now.

dill commented 7 years ago

Okay, though if it does prove to be a mathematical issue, we probably need to disable dsm.var.prop for nb() models.

On 28/03/2017 09:14, erex wrote:

that's fine. It doesn't appear to be a DistWin issue, so it can rest for now.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/DistanceDevelopment/dsm/issues/29#issuecomment-289765567, or mute the thread https://github.com/notifications/unsubscribe-auth/AAAHoch31ZsSU3oCgU4aAJahsJckMQgYks5rqQfSgaJpZM4Mp6yo.

LHMarshall commented 7 years ago

Not a big job but chances I have to vanish get higher as time passes. I will also be out of E-mail contact for much of next week.

erex commented 7 years ago

Returning to variance comparisons (using Distance 7.1 with dsm 2.2.15 upon sperm whale dataset)

Running two variance computations (var.prop and var.gam) upon dsm with Tweedie, results are:

Summary of uncertainty in a density surface model calculated
 analytically for GAM, with delta method

Approximate asymptotic confidence interval:
     2.5%      Mean     97.5% 
 789.3072 1403.7820 2496.6247 
(Using log-Normal approximation)

Point estimate                 : 1403.782 
CV of detection function       : 0.06670757 
CV from GAM                    : 0.2927 
Total standard error           : 421.4417 
Total coefficient of variation : 0.3002 

and

Summary of uncertainty in a density surface model calculated
 by variance propagation.

Quantiles of differences between fitted model and variance model
      Min.    1st Qu.     Median       Mean    3rd Qu.       Max. 
-0.5908000 -0.0242100 -0.0000068 -0.0018560  0.0156800  0.6277000 

Approximate asymptotic confidence interval:
     2.5%      Mean     97.5% 
 765.7713 1509.0108 2973.6208 
(Using log-Normal approximation)

Point estimate                 : 1509.011 
Standard error                 : 538.2913 
Coefficient of variation       : 0.3567 

my concern

Using predict.dsm the point estimate of abundance is 1403.782, matching the value shown by var.gam. The point estimate generated by var.prop is 7.5% larger than the estimate produced in the predict step.

a) will this confuse users, and b) what guidance are we to offer regarding the abundance estimate to be reported.

Of course, 7.5% is well within the confidence intervals, but to some people an extra 100 sperm whales might be a big deal.

dill commented 7 years ago

hmm, tricky.

It looks like there is a fair difference in the estimated coefficients post-varprop re-optimistation (quantiles output). I don't have much time to think about this right now but I would guess that the model might not be giving a great fit. Including more covariates might improve the fit and lead to less coefficient change.

This along with nb are on the list for my varprop week with MVB in May.

On 31/03/2017 05:01, erex wrote:

Returning to variance comparisons (using Distance 7.1 with dsm 2.2.15 upon sperm whale dataset)

Running two variance computations (var.prop and var.gam) upon dsm with Tweedie, results are:

|Summary of uncertainty in a density surface model calculated analytically for GAM, with delta method Approximate asymptotic confidence interval: 2.5% Mean 97.5% 789.3072 1403.7820 2496.6247 (Using log-Normal approximation) Point estimate : 1403.782 CV of detection function : 0.06670757 CV from GAM : 0.2927 Total standard error : 421.4417 Total coefficient of variation : 0.3002 |

and

|Summary of uncertainty in a density surface model calculated by variance propagation. Quantiles of differences between fitted model and variance model Min. 1st Qu. Median Mean 3rd Qu. Max. -0.5908000 -0.0242100 -0.0000068 -0.0018560 0.0156800 0.6277000 Approximate asymptotic confidence interval: 2.5% Mean 97.5% 765.7713 1509.0108 2973.6208 (Using log-Normal approximation) Point estimate : 1509.011 Standard error : 538.2913 Coefficient of variation : 0.3567 |

    my concern

Using |predict.dsm| the point estimate of abundance is 1403.782, matching the value shown by |var.gam|. The point estimate generated by |var.prop| is 7.5% larger than the estimate produced in the predict step.

a) will this confuse users, and b) what guidance are we to offer regarding the abundance estimate to be reported.

Of course, 7.5% is well within the confidence intervals, but to some people an extra 100 sperm whales might be a big deal.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/DistanceDevelopment/dsm/issues/29#issuecomment-290658679, or mute the thread https://github.com/notifications/unsubscribe-auth/AAAHoeDL4Joaps3QbhaLYPwqjotSQB96ks5rrMDbgaJpZM4Mp6yo.

dill commented 7 years ago

Looking back into this, I now get:

> mod1.var <- dsm.var.prop(mod1, preddata, off.set=preddata$area)
Error in gam.fit4(x, y, sp, Eb, UrS = UrS, weights = weights, start = start,  :
  inner loop 3; can't correct step size
In addition: Warning messages:
1: In newton(lsp = lsp, X = G$X, y = G$y, Eb = G$Eb, UrS = G$UrS, L = G$L,  :
  Fitting terminated with step failure - check results carefully
2: In newton(lsp = lsp, X = G$X, y = G$y, Eb = G$Eb, UrS = G$UrS, L = G$L,  :
  Fitting terminated with step failure - check results carefully
3: In newton(lsp = lsp, X = G$X, y = G$y, Eb = G$Eb, UrS = G$UrS, L = G$L,  :
  Fitting terminated with step failure - check results carefully
4: In newton(lsp = lsp, X = G$X, y = G$y, Eb = G$Eb, UrS = G$UrS, L = G$L,  :
  Fitting terminated with step failure - check results carefully
5: In newton(lsp = lsp, X = G$X, y = G$y, Eb = G$Eb, UrS = G$UrS, L = G$L,  :
  Fitting terminated with step failure - check results carefully

(this is for mexdolphins)

So I am inclined to think that for this is a dodgy model anyway.

Re: your earlier comparisons, old versions of dsm did not do the Right Thing, even if the variance estimates were more palatable.

Looking into sperm whales now...

dill commented 7 years ago

Sperm whale code I ran was...

load("~/current/spermwhaledata/R_import/spermwhale.RData")
library(dsm)
library(Distance)

df <- ds(dist, truncation=6000, key="hr")

b <- dsm(count~s(x,y), observation.data=obs, ddf.obj=df, segment.data=segs, family=nb())

vp <- dsm.var.prop(b, predgrid, off.set=predgrid$off.set)
vg <- dsm.var.gam(b, predgrid, off.set=predgrid$off.set)

This yields:

varprop

Summary of uncertainty in a density surface model calculated
 by variance propagation.

Probability of detection in fitted model and variance model
  Original.model Original.model.se Variance.model
1      0.3624567        0.07659373      0.3624567

Approximate asymptotic confidence interval:
        2.5%         Mean        97.5%
    32.12139   2316.12886 167005.64918
(Using log-Normal approximation)

Point estimate                 : 2316.129
Standard error                 : 24974
Coefficient of variation       : 10.7826

vargam

Summary of uncertainty in a density surface model calculated
 analytically for GAM, with delta method

Approximate asymptotic confidence interval:
    2.5%     Mean    97.5%
1474.892 2555.894 4429.202
(Using log-Normal approximation)

Point estimate                 : 2555.894
CV of detection function       : 0.2113123
CV from GAM                    : 0.1929
Total standard error           : 731.3299
Total coefficient of variation : 0.2861

As before. BUT I wonder if the issue is with the detection function -- if there is spatial variation due to sea state unaccounted for in the detection function (here we don't use sea state as a covariate) there is some change this gets sucked into the disperion parameter of the response (I think, because of how the parameterisation of the smoothing par/scale par is setup in mgcv)... This might be causing the issue... I will try to attach sea states to the segments and see if I can estimate a more reasonable model...

dill commented 7 years ago

Notably, the original model:

> vp$dsm.object$scale
[1] 1

varprop refit

> vp$model$scale
[1] 24474.38

Will investigate further and report back...

dill commented 7 years ago

Urgh, the sperm whale sea state stuff is a mess but I did re-run on the SCANS-II minke whale data that I had to hand that has Beaufort as a covariate. With negative binomial response I get:

> vp
Summary of uncertainty in a density surface model calculated
 by variance propagation.

Probability of detection in fitted model and variance model
  beaufort Original.model Original.model.se Variance.model
1    [0,1]     0.49819141        0.10138330      0.4981934
2    (1,2]     0.19617138        0.06761538      0.1961706
3    (2,3]     0.25502779        0.10442860      0.2550238
4    (3,4]     0.04877835        0.04142136      0.0487800

Approximate asymptotic confidence interval:
        2.5%         Mean        97.5%
    203.8379   40791.3750 8163036.1390
(Using log-Normal approximation)

Detection function CV          : 0.2659586

Point estimate                 : 40791.37
Standard error                 : 1576216
Coefficient of variation       : 38.6409

> var_gam
Summary of uncertainty in a density surface model calculated
 analytically for GAM, with delta method

Approximate asymptotic confidence interval:
     2.5%      Mean     97.5%
 6608.539 12928.818 25293.689
(Using log-Normal approximation)

Point estimate                 : 12928.82
CV of detection function       : 0.2659586
CV from GAM                    : 0.2316
Total standard error           : 4559.833
Total coefficient of variation : 0.3527

which seems unreasonable. Will think about this more tomorrow.

erex commented 7 years ago

will stand by ... NB I'm out of the office tomorrow (23Jun).

dill commented 7 years ago

I am almost done with this but am away until Sat morning (Hobart time), will try to get something to you on this ASAP.

LHMarshall commented 7 years ago

So does that mean I can't submit to CRAN until next week?

lenthomas commented 7 years ago

I think the plan was to wait for the new release of R (due on the 30th) before final tests and release. Given that I'm very behind on my fixes, I think submission to CRAN on, say, Monday (or Tuesday) next week, is quite reasonable. Not sure how long it will then take the submission to appear on CRAN, as presumably we want it to have appeared before we do the release of distance-for-windows?

dill commented 7 years ago

Thought about this a wee bit more...

Refitting using a fixed value of the nb parameter might work better (i.e., maybe it's the magic inside of nb()s parameter-finding thing that is making things wacky)

load("~/current/spermwhaledata/R_import/spermwhale.RData")
library(dsm)
library(Distance)

df <- ds(dist, truncation=6000, key="hr")

b <- dsm(count~s(x,y), observation.data=obs, ddf.obj=df, segment.data=segs, family=nb())

# bad
vp <- dsm.var.prop(b, predgrid, off.set=predgrid$off.set)
# apriori wrong
vg <- dsm.var.gam(b, predgrid, off.set=predgrid$off.set)

# what about refitting with a **fixed** nb par?
theta_est <- b$family$getTheta(TRUE)

# 1. refit nb and see what's going on
b_fix <- dsm(count~s(x,y), observation.data=obs, ddf.obj=df, segment.data=segs, family=nb(theta=theta_est))
# do varprop with fixed nb
vp_fix <- dsm.var.prop(b_fix, predgrid, off.set=predgrid$off.set)

# 2. refit negbin and see what's going on
b_fix2 <- dsm(count~s(x,y), observation.data=obs, ddf.obj=df, segment.data=segs, family=negbin(theta_est))
# do varprop with fixed negbin
vp_fix2 <- dsm.var.prop(b_fix2, predgrid, off.set=predgrid$off.set)

Looks like 1. doesn't work...

> summary(vp_fix$model)

Family: Negative Binomial(3384231303.356)
Link function: log

Formula:
count ~ s(x, y) + XX + offset(off.set)

Parametric coefficients:
              Estimate Std. Error z value Pr(>|z|)
(Intercept) -1.930e+01  1.342e+01  -1.438     0.15
XX1          1.068e-14  2.160e-01   0.000     1.00
XX2          1.657e-14  4.284e-01   0.000     1.00

Approximate significance of smooth terms:
       edf Ref.df Chi.sq p-value
s(x,y)   2      2  0.005   0.998

R-sq.(adj) =  0.0163   Deviance explained = 10.2%
-REML = 95.696  Scale est. = 24235     n = 949

but 2. looks more promising?

> summary(vp_fix2$model)

Family: Negative Binomial(3384231303)
Link function: log

Formula:
count ~ s(x, y) + XX + offset(off.set)

Parametric coefficients:
              Estimate Std. Error z value Pr(>|z|)
(Intercept) -2.085e+01  4.472e-01  -46.62   <2e-16 ***
XX1         -3.442e-10  3.025e-01    0.00        1
XX2         -5.282e-10  6.000e-01    0.00        1
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Approximate significance of smooth terms:
         edf Ref.df Chi.sq p-value
s(x,y) 23.08  25.48  252.9  <2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

R-sq.(adj) =  0.123   Deviance explained =   35%
-REML = 594.87  Scale est. = 1         n = 949

So I have written a wee bit of code in dsm_varprop (the internal bit that does the work) to refit nb() models with a fixed parameter using negbin() (with a warning). See 206ab87.

One other stupid issue here is that the scale parameter stuff is actually not the issue. The numbers are the same, but on different scales... Seems like I needed to use NextMethod rather than calling summary again to get the extended families stuff to interpret the parameters correctly... see 06f77fd.

Closing this now, but feel free to re-open if this is an unsatisfactory solution!

erex commented 7 years ago

Call me a skeptic, but there's no output showing the estimated variance produced by your revision. Hence, don't see evidence that dsm_varprop() does the "right" thing.

dill commented 7 years ago

You are a skeptic. But fair enough...

load("~/current/spermwhaledata/R_import/spermwhale.RData")
library(dsm)
library(Distance)

df <- ds(dist, truncation=6000, key="hr")

b <- dsm(count~s(x,y), observation.data=obs, ddf.obj=df, segment.data=segs, family=nb())

vp <- dsm.var.prop(b, predgrid, off.set=predgrid$off.set)
vg <- dsm.var.gam(b, predgrid, off.set=predgrid$off.set)

Produces:

> vp
Summary of uncertainty in a density surface model calculated
 by variance propagation.

Probability of detection in fitted model and variance model
  Original.model Original.model.se Variance.model
1      0.3624567        0.07659373      0.3624567

Approximate asymptotic confidence interval:
    2.5%     Mean    97.5%
1620.784 2517.019 3908.838
(Using log-Normal approximation)

Point estimate                 : 2517.019
Standard error                 : 572.4702
Coefficient of variation       : 0.2274

> vg
Summary of uncertainty in a density surface model calculated
 analytically for GAM, with delta method

Approximate asymptotic confidence interval:
    2.5%     Mean    97.5%
1474.892 2555.894 4429.202
(Using log-Normal approximation)

Point estimate                 : 2555.894
CV of detection function       : 0.2113123
CV from GAM                    : 0.1929
Total standard error           : 731.3299
Total coefficient of variation : 0.2861

The varprop call produces a series of warnings, probably because the negbin parameter is not quite right -- the warning we issue suggests the suer consult dsm_varprop where this is noted:

> warnings()
Warning messages:
1: In dsm_varprop(dsm.obj, pred.data[[1]]) :
  Model was fitted using nb() family, refitting with negbin(). See ?dsm_varprop
2: In newton(lsp = lsp, X = G$X, y = G$y, Eb = G$Eb, UrS = G$UrS,  ... :
  Fitting terminated with step failure - check results carefully
3: In newton(lsp = lsp, X = G$X, y = G$y, Eb = G$Eb, UrS = G$UrS,  ... :
  Fitting terminated with step failure - check results carefully
4: In newton(lsp = lsp, X = G$X, y = G$y, Eb = G$Eb, UrS = G$UrS,  ... :
  Fitting terminated with step failure - check results carefully
5: In newton(lsp = lsp, X = G$X, y = G$y, Eb = G$Eb, UrS = G$UrS,  ... :
  Fitting terminated with step failure - check results carefully
6: In newton(lsp = lsp, X = G$X, y = G$y, Eb = G$Eb, UrS = G$UrS,  ... :
  Fitting terminated with step failure - check results carefully
7: In newton(lsp = lsp, X = G$X, y = G$y, Eb = G$Eb, UrS = G$UrS,  ... :
  Fitting terminated with step failure - check results carefully
8: In newton(lsp = lsp, X = G$X, y = G$y, Eb = G$Eb, UrS = G$UrS,  ... :
  Fitting terminated with step failure - check results carefully
9: In newton(lsp = lsp, X = G$X, y = G$y, Eb = G$Eb, UrS = G$UrS,  ... :
  Fitting terminated with step failure - check results carefully
10: In newton(lsp = lsp, X = G$X, y = G$y, Eb = G$Eb, UrS = G$UrS,  ... :
  Fitting terminated with step failure - check results carefully
11: In newton(lsp = lsp, X = G$X, y = G$y, Eb = G$Eb, UrS = G$UrS,  ... :
  Fitting terminated with step failure - check results carefully
12: In newton(lsp = lsp, X = G$X, y = G$y, Eb = G$Eb, UrS = G$UrS,  ... :
  Fitting terminated with step failure - check results carefully
13: In newton(lsp = lsp, X = G$X, y = G$y, Eb = G$Eb, UrS = G$UrS,  ... :
  Fitting terminated with step failure - check results carefully
14: In newton(lsp = lsp, X = G$X, y = G$y, Eb = G$Eb, UrS = G$UrS,  ... :
  Fitting terminated with step failure - check results carefully
15: In newton(lsp = lsp, X = G$X, y = G$y, Eb = G$Eb, UrS = G$UrS,  ... :
  Fitting terminated with step failure - check results carefully
16: In newton(lsp = lsp, X = G$X, y = G$y, Eb = G$Eb, UrS = G$UrS,  ... :
  Fitting terminated with step failure - check results carefully
17: In newton(lsp = lsp, X = G$X, y = G$y, Eb = G$Eb, UrS = G$UrS,  ... :
  Fitting terminated with step failure - check results carefully
18: In newton(lsp = lsp, X = G$X, y = G$y, Eb = G$Eb, UrS = G$UrS,  ... :
  Fitting terminated with step failure - check results carefully

help suggests:

Negative binomial models fitted using the \code{\link{nb}} family will give strange results (overly big variance estimates due to scale parameter issues) so \code{nb} models are automatically refitted with \code{\link{negbin}} (with a warning). It is probably worth refitting these models with \code{negbin} manually (perhaps giving a smallish range of possible values for the negative binomial parameter) to check that convergence was reached.

Thanks for asking me this though, as I caught 2 typos.

erex commented 7 years ago

That makes me feel better. I'll have to learn how to interpret varprop output. I don't know what the three values here mean:

Probability of detection in fitted model and variance model
  Original.model Original.model.se Variance.model
1      0.3624567        0.07659373      0.3624567

I gather the first and second values are point estimate of $\hat{P}$ and it estimated SE. Is the third value supposed to be the $SE^2$? Why are the first and third numbers identical?

dill commented 7 years ago

Clarified this in the documentation in 2899ed7. Let me know if this helps.

erex commented 7 years ago

So when \hat{P} under fitted and "refitted" models are identical, all is right with the world. Guess I would prefer the output be labelled "fitted" and "refitted". Variance.model unsettles me. What do you think?

erex commented 7 years ago

Spermwhale dataset cannot survive moving block bootstrap results from 99 replicates:

Summary of bootstrap uncertainty in a density surface model
Detection function uncertainty incorporated using the delta method.

Boxplot coeff     : 1.5 
Replicates        : 99 
Outliers          : 0 
Infinites         : 0 
NAs               : 98 
NaNs              : 0 
Usable replicates : 1 (100%)
Approximate asymptotic bootstrap confidence interval:
    2.5%     Mean    97.5% 
      NA 1710.336       NA 
(Using log-Normal approximation)

Point estimate                 : 1710.336 
CV of detection function       : 0.06670757 
CV from bootstrap              : NA 
Total standard error           : NA 
Total coefficient of variation : NA 
dill commented 7 years ago

@erex can you give the details of the model this happens for?

erex commented 7 years ago

Sperm whale moving block bootstrap

My labelling of the analysis suggests this was produced by

dsm.4<-dsm(ddf.obj=ddf.2,formula=n~s(x,y,bs="ts")+s(depth, bs="ts") +s(disttocas, bs="ts") +s(sst, bs="ts") +s(eke, bs="ts") +s(npp, bs="ts"),family=nb(link='log'),group=FALSE,engine='gam',convert.units=1,segment.data=sample.dat.4,observation.data=obs.dat.4)

your "kitchen sink" count covariate model with nb response distribution; same model survives the var.gam and var.prop experience.

The profusion of NAs appears to be caused by:

Error in while (mean(ldxx/(ldxx + ldss)) > 0.4) { : 
  missing value where TRUE/FALSE needed
In addition: Warning message:
In sqrt(w) : NaNs produced

I have a deja vu sense that we've been here before

dill commented 7 years ago

One of the issues here is I don't think there is a valid transect label in the data (or at least the version I have)...

I can reproduce this if I set the bootstrap sample unit to be the segment.

FWIW, I was not planning on getting workshop participants to do the bootstrap anyway...

On 06/07/2017 22:40, erex wrote:

    Sperm whale moving block bootstrap

My labelling of the analysis suggests this was produced by

dsm.4<-dsm(ddf.obj=ddf.2,formula=n~s(x,y,bs="ts")+s(depth, bs="ts") +s(disttocas, bs="ts") +s(sst, bs="ts") +s(eke, bs="ts") +s(npp, bs="ts"),family=nb(link='log'),group=FALSE,engine='gam',convert.units=1,segment.data=sample.dat.4,observation.data=obs.dat.4)

your "kitchen sink" |count| covariate model with nb response distribution; same model survives the |var.gam| and |var.prop| experience.

The profusion of NAs appears to be caused by:

|Error in while (mean(ldxx/(ldxx + ldss)) > 0.4) { : missing value where TRUE/FALSE needed In addition: Warning message: In sqrt(w) : NaNs produced |

I have a deja vu sense that we've been here before

— You are receiving this because you modified the open/close state. Reply to this email directly, view it on GitHub https://github.com/DistanceDevelopment/dsm/issues/29#issuecomment-313384469, or mute the thread https://github.com/notifications/unsubscribe-auth/AAAHoakLdzf3WjyQprXjzp5uqDb6a2m8ks5sLNWzgaJpZM4Mp6yo.

erex commented 7 years ago

That's where I remember this arising: no transects in data created by MGET! Clearly participants can't perform the bootstrap if they don't have transects.

dill commented 7 years ago

Right, okay. Let's drop this for now then.

On 06/07/2017 22:53, erex wrote:

That's where I remember this arising: no transects in data created by MGET! Clearly participants can't perform the bootstrap if they don't have transects.

— You are receiving this because you modified the open/close state. Reply to this email directly, view it on GitHub https://github.com/DistanceDevelopment/dsm/issues/29#issuecomment-313387353, or mute the thread https://github.com/notifications/unsubscribe-auth/AAAHocjqLScgk7qKIwvXKsanQN1jb2-7ks5sLNi6gaJpZM4Mp6yo.