DistanceDevelopment / Distance

Simple distance sampling analysis
GNU General Public License v3.0
9 stars 8 forks source link

Camera trap distance sampling #64

Closed erex closed 4 years ago

erex commented 4 years ago

There have been two posts to the Distance sampling list requesting either a vignette or training in this subject. I built such a vignette based upon Howe et al.'s 2017 duiker data.

howe-vignette.pdf

I've not shared the analysis with the Distance community because the density estimate produced does not match the estimate published in the paper (19.3 in R and 14.5 in the paper) MEE 2017 distance sampling cameras.pdf

I've not been able to track down the source of the discrepency. Eric shared with me his DistWin project that produced the estimate. I have analysed in R both the data set from Dryad as well as the data extracted from the DistWin project. The Dryad data and the DistWin project data produce identical results in R of 19.3.

I do know where the discrepency does not occur. There is no difference in AIC, hazard rate parameter estimates, P_a or EDR between DistWin and R, so the difference in results is not the result of optimisation issues.

Reproducible example

## ----setup, include=FALSE------------------------------------------------------------
knitr::opts_chunk$set(echo = TRUE)

## ---- message=FALSE------------------------------------------------------------------
library(Distance)

## ----cars----------------------------------------------------------------------------
peak.activity <- read.delim("P:/pCloud Sync/Howe viva/PeakDistances.txt")
peak.activity$Area <- peak.activity$Area / 1e+06 # sq m to sq km
col.order <- c("Region.Label", "Area", "multiplier", "Sample.Label", "Effort", "distance")
peak <- peak.activity[, col.order]

## ----pressure, echo=FALSE------------------------------------------------------------
sum(!is.na(peak$distance))
table(peak$Sample.Label)

## ------------------------------------------------------------------------------------
breakpoints <- c(seq(0,8,1), 10, 12, 15, 21)
hist(peak$distance, breaks=breakpoints, main="Peak activity data set",
     xlab="Radial istance (m)")

## ----fit-----------------------------------------------------------------------------
trunc.list <- list(left=2, right=15)
conversion <- convert_units("meter", NULL, "square kilometer")
peak.hn <- ds(peak, transect = "point", key="hn", adjustment="herm",
              cutpoints = c(seq(2,8,1), 10, 12, 15), truncation = trunc.list,
              convert.units = conversion)
peak.unicos <- ds(peak, transect = "point", key="unif", adjustment = "cos",
                  cutpoints = c(seq(2,8,1), 10, 12, 15), truncation = trunc.list,
              convert.units = conversion)
peak.hr <- ds(peak, transect = "point", key="hr", adjustment = "cos", 
              cutpoints = c(seq(2,8,1), 10, 12, 15), truncation = trunc.list,
              convert.units = conversion)
# summary(peak.hr)

## ------------------------------------------------------------------------------------
knitr::kable(summarize_ds_models(peak.hn, peak.unicos, peak.hr), digits = 3, 
             caption="Model selection for three key functions fitted to duiker peak activity data")

## ------------------------------------------------------------------------------------
p_a <- peak.hr$ddf$fitted[1]
w <- 15
rho <- sqrt(p_a * w^2)

## ------------------------------------------------------------------------------------
par(mfrow=c(1,2))
plot(peak.hr, main="Peak activity", xlab="Distance (m)",
     showpoints=FALSE, lwd=3, xlim=c(0, 15))
plot(peak.hr, main="Peak activity", xlab="Distance (m)", pdf=TRUE,
     showpoints=FALSE, lwd=3, xlim=c(0, 15))
par(mfrow=c(1,1))

## ---- sampfrac-----------------------------------------------------------------------
viewangle <- 42 # degrees
samfrac <- viewangle / 360
conversion <- convert_units("meter", NULL, "square kilometer")
peak.hr.dens <- dht2(peak.hr, flatfile=peak.activity, strat_formula = ~1,
                     sample_fraction = samfrac, er_est = "P2", convert_units = conversion)
print(peak.hr.dens, report="density")

## ---- multiplier---------------------------------------------------------------------
mult.list <- list(creation=data.frame(rate=unique(peak$multiplier),
                                      SE=0))
peak.hr.dens.mult <- dht2(peak.hr, flatfile=peak.activity, strat_formula = ~1,
                     multipliers = mult.list, convert_units = conversion)
print(peak.hr.dens.mult, report="density")

## ---- bootstrap----------------------------------------------------------------------
peak.activity <- read.delim("P:/pCloud Sync/Howe viva/PeakDistances.txt")
peak.activity$Area <- peak.activity$Area / 1e+06 # sq m to sq km
col.order <- c("Region.Label", "Area", "multiplier", "Sample.Label", "Effort", "distance")
peak.activity$Effort <- peak.activity$Effort * peak.activity$multiplier
peak <- peak.activity[, col.order]

peak.hr <- ds(peak, transect = "point", key="hr", adjustment = "cos", 
              cutpoints = c(seq(2,8,1), 10, 12, 15), truncation = trunc.list,
              convert.units = conversion)

mysummary <- function(ests, fit){
  return(data.frame(Dhat = ests$individuals$D$Estimate))
}
duiker.boot.hr <- bootdht(model=peak.hr, flatfile=peak, resample_transects = TRUE,
                       nboot=299, summary_fun=mysummary)
summary(duiker.boot.hr)

## ------------------------------------------------------------------------------------
hist(duiker.boot.hr$Dhat, breaks = 20, 
     main="D-hat estimates bootstraps")

Data file as text

PeakDistances.txt

Not an urgent matter

The user community is curious, but not rabid about this subject. I do have a participant coming to the July workshop who is using camera trapping, so I would like to add this as a supplemental component to Practical 9 that deals with multipliers.

lenthomas commented 4 years ago

I've looked into this, and it seems to be a bug in the area_calc function within the dht2 function (around line 380 of current code). The problem is that with left truncation for point transects, the area sampled is given by effort * pi * ((w^2 - wl^2) * sample_fraction where w is the right truncation distance and wl is the left truncation distance, but what is being calculated is effectively effort * pi * ((w - wl)^2 * sample_fraction (this can be seen by looking at line 376). I recommend the area_calc function is changed so that it accepts both the left and right truncation distances, rather than just the strip half-width, and then it is changed to something like

area_calc <- function(w, wl, effort, transect_type, sample_fraction){
    if(transect_type == "point"){
      return(effort * pi * (w^2 - wl^2) * sample_fraction)
    }else{
      return(effort * 2 * (w-wl) * sample_fraction)
    }
  }

Here is some code that I used after running Eric's code snippet, above, that led me down this path. (The cryptic comment about checking p_a is calculated correctly is explained below - it's an esoteric point, so eel free to ignore!)

#Try to estimate by hand, using equations 1 or 2 from the Howe paper.
# Note that formula ignores left truncation, so I need to check the
# p_a is calculated correctly so that left truncation can be ignored
# (can do that later, if I don't get to the bottom of this sooner).
library(dplyr)
wl <- trunc.list$left
peak_sample <- peak %>%
  group_by(Sample.Label) %>%
  summarize(Effort = first(Effort), multiplier = first(multiplier), 
            nk = sum((distance > wl) & (distance <= w), na.rm = TRUE), 
            .groups = "drop")
peak_sample$ek <- peak_sample$Effort * peak_sample$multiplier
peak_stratum <- peak %>%
  group_by(Region.Label) %>%
  summarize(Area = first(Area), .groups = "drop")
Dhat <- sum(peak_sample$nk) / (pi * w^2 * sum(peak_sample$ek * p_a))
#Convert units to number per km2
Dhat <- Dhat * 1000^2
Dhat #I get 14.51055 (paper gets 14.5)

#Check n, effort and number of samples
sum(peak_sample$nk) #5865 - same as peak.hr.dens
sum(peak_sample$Effort) #12317058 - same as peak.hr.dens
nrow(peak_sample) #21 - same as peak.hr.dens

#Check covered area
pi * (w^2 - wl^2) * sum(peak_sample$ek) / 1000^2 # 997.691 - 
# but peak.hr.dens gets 762.9399
# Bug in the dht2 code -- it is computing
pi * (w - wl)^2 * sum(peak_sample$ek) / 1000^2
# on line 381 of dht2 function, in area_calc function
# (similar computation works for line transects, but not for points)

(Aside on "correct" calculation of p_a. With left tuncation, there are three ways to define p_a:

  1. The average detection probablity between 0 and w ignoring the fact that detection is not possible between 0 and wl;
  2. The average detection probability between 0 and w, including the fact that detection probability between 0 and wl is 0;
  3. The average detection probability between wl and w. Depending on which you do, the formula for Dhat may need to be modified -- the (pi w^2) on the bottom line may need to be replaced with (pi (w - wl)^2). Feel free to ask me if you need more details -- but an incompatible definition of p_a and effort does not seem to be the issue with dht2 here -- it seems to be the problem I pointed out above.)
lenthomas commented 4 years ago

(Addition to the aside -- Tiago and I actually came across this issue before, because we found a difference in the way we had calculated p_a (which was 3. above) and the way Distance for Windows does it (method 2. above). We needed to write a clarification to a previous paper -- so if you want full details, see http://lenthomas.org/papers/MarquesCREEMTR2012.pdf. )

Also, in case it's any use, I made a Distance for Windows project using the above data, to check the variance calculations. (It seems to check out.) I attach it below. CameraTrap.zip

dill commented 4 years ago

I've made the correction you suggest above @lenthomas. I now get a result of 14.7732 for density, using the block marked multiplier in @erex's code. If the paper reports 14.5, is this "close enough the the difference is between implementation of P_a calculation methods 2 and 3 above?

lenthomas commented 4 years ago

Err.. the correct answer is 14.510, so it seems the fix I suggested is not completely correct. I am sure the answer is in the fact this is point transect left truncation. Do you want me to look into this more or will you @dill?

dill commented 4 years ago

I am swamped today so if you can look that would be great.

Could this just be a difference in methods 2 and 3 that you list above?

lenthomas commented 4 years ago

Yes, you were right. Distance for Windows implements method 2, above, which means that left truncation can then be ignored when calculating the covered area. So, it turned out that the correct fix was to replace df_width <- (ddf$ds$aux$width - ddf$ds$aux$left)*convert_units with df_width <- ddf$ds$aux$width*convert_units and leave area_calc alone. I made this change, and also put a comment in area_calc to explain why left truncation can be ignored. I pushed the fix to the 1.0.1-candidate branch - hope I did the right thing there! (I note it says I'm "anonymous" so I clearly have some configuration wrong.) I suggest we add this analysis as one of our test projects in test_that - we may need to get permission from Eric Howe if the data are not public already.

erex commented 4 years ago

Public availability of Eric's data shouldn't be a problem https://datadryad.org/stash/data_paper/doi:10.5061/dryad.b4c70?

dill commented 4 years ago

Thanks @lenthomas.

Agree that it would be good to have this as a test. @erex, is your plan to have your above vignette available on the examples site and if so would it be worth bundling the duiker data so that it's in a convenient format for that analysis (as well as for tests) and ship it with Distance as we do with other example datasets? If so, is the appropriate format that which is detailed in the vignette you posted?

erex commented 4 years ago

Yes, the posting to the list that started this issue six weeks ago requested a vignette.

I was wondering if there were any plans to produce vignettes or training courses on this new method, or if a R Markdown file for the duiker analysis described in Howe et al was available?

Obvious place to put the vignette would be in https://examples.distancesampling.org. I had written the vignette also available at the beginning of this issue

It would be lovely from the vignette perspective if the data set could be read directly by read.delim() from Dryad, but I remember that being difficult for some reason.

Seems the format works fine.

Note the comment I made near the end of the vignette regarding some incompatibility performing bootstraps in the presence of multipliers. Guess that forms the basis of a new issue.

dill commented 4 years ago

Agree that it would be neat to read from Dryad for the vignette, but that wouldn't be ideal for the test (of course I can include the dataset in Distance and you can ignore it and load the data from Dryad anyway but I'm going to include it, it may as well be in a useful format).

I suggest that I add the data and test for the 1.0.1 release and close this issue. Also suggest we punt the bootstrap issue (which if you can raise that in a new issue would be great) to 1.0.2 given time constraints (or poush 1.0.1 release date back).

dill commented 4 years ago

I've added an outline script to convert this data to the dsdata conversion scripts. I've also added a rudimentary manual page here which might benefit from additional information (currently scraped info from the Dryad page, but knowing almost nothing about camera trapping I may be missing important details).

dill commented 4 years ago

(I'll merge this data into Distance once you confirm it's in reasonable order.)

erex commented 4 years ago

Thanks Dave. Give me a day or two to look through the manual page and perhaps make an addition or two.

lenthomas commented 4 years ago

Regarding the bootstrap, .@dill I see that adding in multipliers with uncertainty is an issue that should be reserved for a future release, as one then needs to sample from some distribution for the multiplier. However, the case where there is no uncertainty should be straightforward. In this current case, one way (and I think the better way) to enter the fact that the camera has a limited field of view is through the sample_fraction argument to dht2. Is this currently passed on into the bootstrapping code? If so, then it all works out of the box, so long as one goes that route rather than via a multiplier. What do you think? I can check it out if you're not sure.

dill commented 4 years ago

I've added some code to include sample_fraction as an argument to bootdht but I'm wondering if that is the best term for that in this context. Perhaps we can have both multipliers and sample_fraction as in dht2 but for now when the $se element of the multipliers is not zero we throw an error?

dill commented 4 years ago

(Not that code has not been committed to git yet)

lenthomas commented 4 years ago

I think that sample_fraction is a fine term in this case because it refers to the proportion of the circle that is in the field of view of the camera, which is exactly the kind of instance that sample_fraction was intended for. (One-sided transects, cue counts where only a portion of the circle is in view, etc.) But I'm also happy if you want to include multipliers as well and throw an error in the bootstrap if the $se is not zero, as you suggest. Thanks! :)

dill commented 4 years ago

Okay I'll leave this as sample_fraction for this release and flag an issue for multipliers for 1.0.2.

dill commented 4 years ago

This is now easily test-able as I've merged everything into the master branch.

Note that running unit tests on this dataset is problematic as there are so many observations. Will look into fitting to a subset of the data.

dill commented 4 years ago

@erex / @lenthomas are you satisfied here and can I close this issue?

lenthomas commented 4 years ago

I confirm that the analysis gives (within tolerance) the same density and SE as Distance for Windows (the latter gives 14.510 and 4.3836, respectively). I have not tested the bootstrap. I see that the duiker example dataset in dsdata has a column "multiplier" for the sample_fraction, but I suppose that is becuase this is the way it was given in the Dryad, so okay by me. I expect the vignette will clarify how to get the same result as the paper (since I think at present it has to be treated as a sample_fraction and not a multiplier). So, I'm satisfied so long as you're confident the bootstrap works.

erex commented 4 years ago

sorry,haven't been keeping an eye upon this issue. will have a look tomorrow

erex commented 4 years ago

I've re-run the vignette I wrote back in May, duplicating Howe's analysis published in MEE (reading data from file rather than using in-built data set). I concur with Len, Distance 1.0.1.9002 produces point and interval estimates that mimic those in the MEE paper for peak activity data set.

This analysis does include a short bootstrap (49 reps, elapsed time ~2 hours) without a multiplier, accomplished by altering the original effort data. Median of that small number of bootstraps (21) is far from point estimate of density (14.5), but that's likely because of the huge encounter rate variance.

I've not tried to produce the naive bootstrap with the constant multiplier.

howe-vignette-Distance1.0.1.9002.pdf

dill commented 4 years ago

Thanks, I think there is a separate issue to deal with the speed of bootstrapping but perhaps we can close this for now and come back to that speed issue as part of optimization improvements?

On 24/07/2020 10:45, erex wrote:

I've re-run the vignette I wrote back in May, duplicating Howe's analysis published in MEE (reading data from file rather than using in-built data set). I concur with Len, Distance 1.0.1.9002 produces point and interval estimates that mimic those in the MEE paper for peak activity data set.

This analysis does include a short bootstrap (49 reps, elapsed time ~2 hours) without a multiplier, accomplished by altering the original effort data. Median of that small number of bootstraps (21) is far from point estimate of density (14.5), but that's likely because of the huge encounter rate variance.

I've not tried to produce the naive bootstrap with the constant multiplier.

howe-vignette-Distance1.0.1.9002.pdf https://github.com/DistanceDevelopment/Distance/files/4971216/howe-vignette-Distance1.0.1.9002.pdf

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/DistanceDevelopment/Distance/issues/64#issuecomment-663455557, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAAAPIOGUSPJTUXWUZ7NKSTR5FJ2JANCNFSM4NPST2DA.

erex commented 4 years ago

just a minute; I'm checking whether bootdht can handle sample_fraction in a fitted model object.

dill commented 4 years ago

okay, no problem.

On 24/07/2020 10:51, erex wrote:

just a minute; I'm checking whether |bootdht| can handle |sample_fraction| in a fitted model object.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/DistanceDevelopment/Distance/issues/64#issuecomment-663460198, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAAAPIMRB2WTKPLSJEM23FLR5FKSVANCNFSM4NPST2DA.

erex commented 4 years ago

the answer to sample_fraction appears to be "No"

> boot.samfrac <- bootdht(model=peak.hr.dens, flatfile=peak, resample_transects = TRUE, nboot=5, summary_fun=mysummary)
Error in bootdht(model = peak.hr.dens, flatfile = peak, resample_transects = TRUE,  : 
  Only models fitted by Distance::ds() may be used

I presume I'll get the same answer when using multiplier

erex commented 4 years ago

as I thought

> boot.samfrac <- bootdht(model=peak.hr.dens.mult, flatfile=peak, resample_transects = TRUE, nboot=5, summary_fun=mysummary)
Error in bootdht(model = peak.hr.dens.mult, flatfile = peak, resample_transects = TRUE,  : 
  Only models fitted by Distance::ds() may be used
erex commented 4 years ago

suppose current bootdht documentation might need to mention these limitations

dill commented 4 years ago

Sorry I've lost track of what different model objects are here... is peak.hr.dens and peak.hr.dens.mult as in your initial post and are dht2() return objects? Or are they returns from ds()?

erex commented 4 years ago

sorry; peak.hr.dens and peak.hr.dens.mult are indeed coming out of dht2

dill commented 4 years ago

in that case the error is correct, as model= needs to be a fitted model from ds() (or ddf()). You need to supply the sample_fraction= argument to bootdht(), not at the modelling stage.

erex commented 4 years ago

oops, let me try again

erex commented 4 years ago

This looks promising for sample_fraction argument in bootdht

> samfrac <- viewangle / 360
> test.samfra <- bootdht(model=peak.hr, flatfile=peak, resample_transects = TRUE, sample_fraction = samfrac, nboot = 5, summary_fun = mysummary)
Performing 5 bootstraps
  |======================================================================                 |  80%Error : 
gosolnp-->Could not find a feasible starting point...exiting

  |=======================================================================================| 100%
Warning message:
In dht(model, region.table, sample.table, obs.table, options = list(group = dht.group,  :
  Point transect encounter rate variance can only use estimators P2 or P3, switching to P3.
> summary(test.samfra)
Bootstrap results

Boostraps          : 5 
Successes          : 5 
Failures           : 0 

     median  mean se   lcl  ucl   cv
Dhat  16.02 22.23 10 13.89 35.1 0.62
dill commented 4 years ago

Seeing you've opened #70, can this issue be closed?

On 24/07/2020 11:32, erex wrote:

This looks promising for |sample_fraction| argument in |bootdht|

|> samfrac <- viewangle / 360 > test.samfra <- bootdht(model=peak.hr, flatfile=peak, resample_transects = TRUE, sample_fraction = samfrac, nboot = 5, summary_fun = mysummary) Performing 5 bootstraps |====================================================================== | 80%Error : gosolnp-->Could not find a feasible starting point...exiting |=======================================================================================| 100% Warning message: In dht(model, region.table, sample.table, obs.table, options = list(group = dht.group, : Point transect encounter rate variance can only use estimators P2 or P3, switching to P3. > summary(test.samfra) Bootstrap results Boostraps : 5 Successes : 5 Failures : 0 median mean se lcl ucl cv Dhat 16.02 22.23 10 13.89 35.1 0.62 |

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/DistanceDevelopment/Distance/issues/64#issuecomment-663475028, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAAAPIJGY5GFZ6GPXW75C43R5FPMRANCNFSM4NPST2DA.

erex commented 4 years ago

yes, happy for yo to close #64