DistanceDevelopment / Distance

Simple distance sampling analysis
GNU General Public License v3.0
9 stars 8 forks source link

How does `bootdht` know to call `dht2`? #75

Closed erex closed 4 years ago

erex commented 4 years ago

@dill

I ask because this commit fixed the left truncation problem for the duikers, such that dht2 produces the correct point estimate of density.

But is this correct version of dht2 actually used by bootdht? My reading of bootit within bootdht does not invoke dht2

I raise this because running a sufficient number (250) replicates of the duiker analysis produces a mean of the sampling distribution that is 50% larger than the point estimate produced by dht2

image

image

I'm not sure if this difference between point estimate and mean of sampling distribution is just chance, but the sampling distribution mean is similar to the point estimate that was calculated prior to the correction applied to dht2 in the 13July20 commit. Is it coincidence? I don't know.

erex commented 4 years ago

Code to produce point estimates and bootstrap

bootstrap completion time ~8-10hr

## ---- readin, message=FALSE------------------------------------------------------------------------
library(Distance)
data("DuikerCameraTraps")
## ----fit-------------------------------------------------------------------------------------------
conversion <- convert_units("meter", NULL, "square kilometer")
trunc.list <- list(left=2, right=15)
peak.hn <- ds(DuikerCameraTraps, transect = "point", key="hn", adjustment="herm",
              cutpoints = c(seq(2,8,1), 10, 12, 15), truncation = trunc.list)
peak.unicos <- ds(DuikerCameraTraps, transect = "point", key="unif", adjustment = "cos",
                   cutpoints = c(seq(2,8,1), 10, 12, 15), truncation = trunc.list)
peak.hr <- ds(DuikerCameraTraps, transect = "point", key="hr", adjustment = "cos", 
              cutpoints = c(seq(2,8,1), 10, 12, 15), truncation = trunc.list)
## ---- sumtab---------------------------------------------------------------------------------------
knitr::kable(summarize_ds_models(peak.hn, peak.unicos, peak.hr), digits = 3, 
             caption="Model selection for three key functions fitted to duiker peak activity data")
## --------------------------------------------------------------------------------------------------
p_a <- peak.hr$ddf$fitted[1]
w <- 15
rho <- sqrt(p_a * w^2)
## --------------------------------------------------------------------------------------------------
par(mfrow=c(1,2))
plot(peak.hr, main="Peak activity", xlab="Distance (m)",
     showpoints=FALSE, lwd=3, xlim=c(0, 15))
plot(peak.hr, main="Peak activity", xlab="Distance (m)", pdf=TRUE,
     showpoints=FALSE, lwd=3, xlim=c(0, 15))
## ---- echo=FALSE-----------------------------------------------------------------------------------
par(mfrow=c(1,1))
## ---- sampfrac-------------------------------------------------------------------------------------
viewangle <- 42 # degrees
samfrac <- viewangle / 360
conversion <- convert_units("meter", NULL, "square kilometer")
peak.hr.dens <- dht2(peak.hr, flatfile=DuikerCameraTraps, strat_formula = ~1,
                     sample_fraction = samfrac, er_est = "P2", convert_units = conversion)
print(peak.hr.dens, report="density")
## ---- bootstrap, results='hide'--------------------------------------------------------------------
viewangle <- 42 # degrees
samfrac <- viewangle / 360
mysummary <- function(ests, fit){
  return(data.frame(Dhat = ests$individuals$D$Estimate))
}
duiker.boot.hr <- bootdht(model=peak.hr, flatfile=DuikerCameraTraps, resample_transects = TRUE,
                       nboot=250, summary_fun=mysummary, sample_fraction = samfrac,
                       convert.units = conversion)
## ----bootresult------------------------------------------------------------------------------------
print(summary(duiker.boot.hr))
## ---- fig.width=8, fig.cap="Distribution of density estimates from bootstrap replicates."----------
hist(duiker.boot.hr$Dhat[duiker.boot.hr$Dhat<100], breaks = 20, 
     xlab="Estimated density", main="D-hat estimates bootstraps")
abline(v=quantile(duiker.boot.hr$Dhat, probs = c(0.025,0.975)), lwd=2, lty=3)
lenthomas commented 4 years ago

Minor comments about speed and output (not addressing the main point, which I'll leave for @dill!): Yup, it's a big dataset! Could speed up a bit by specifying no adjustments, since none selected in original analysis. Longer term, it would be good to have an option to parallelize the bootstrap - add as an enhancement, @dill? If this is going into a vignette, @erex, you could consider specifying the P3 er.var in the first runs to suppress the warning message (and, @dill, you could potentially remove that warning in future releases since it specifies in the help that P3 is the default for point transects).

dill commented 4 years ago

To answer @erex's question first: bootdht only calls dht not dht2.

You are right, the same bug is in dht as in dht2 see here, there is another mention of the same code snippet here but I believe that this is correct as it's finding the covered area, so needs to have left truncation included.

This is a bit of a horrendous dataset to do any testing on. For future reference, in case it's useful to do simple checks an easy option is to use debug::mtrace to get inside bootdht then mtrace the bootit function within, you can then replace the data with the original data to see if the results match dht2 (which they didn't in this case).

Regarding the warning message, it originates in dht, and is there in case one specifies a non-default option (like say O2) which you can't use with points. I'll look into a better way of presenting this in future and move this to a separate issue for next release.

dill commented 4 years ago

@erex I'm wondering if it is your intention to re-run the above simulation to check the above. If so, perhaps I can write a rough version of the parallelisation code (not to be included in the 1.0.1 release) to speed things up and then run this on bluewhale on a lot of cores. That way we can get this wrapped-up and the releases of mrds and Distance off to CRAN today.

erex commented 4 years ago

@dill Happy to re-run the duiker example just now. I was thinking you might be changing the version number on mrds before I did so allowing me to reference that versioning number in a report back on bootstrap.

Rather than hassle you with a quick-and-dirty bootstrap, If I start now, maybe 100 reps could get done by mid-afternoon. OK?

dill commented 4 years ago

Sorry, forgot to bump the version. Now done.

That sounds fine as a plan.

erex commented 4 years ago

ok: installing 2.2.2.9009 just now, will hit the run button

erex commented 4 years ago

@dill @lenthomas Here's a positive development: the bootstrap with 100 reps only took a bit over an hour (turning off adjustments must have had a large effect). Based upon the 100 reps, the mean and median of the sampling distribution for D-hat is much closer to the point estimate produced by the single call to dht2 see figure below image

recall dht2 reports D-hat = 14.51, with CI= (7.8, 26.9)

dill commented 4 years ago

That's great, thanks Eric. I'll close this in that case.

I'll also add a note to the bootdht manual page to let people know that adjustments may add a significant amount time to their bootstraps.