DistanceDevelopment / Distance

Simple distance sampling analysis
GNU General Public License v3.0
9 stars 8 forks source link

Fitted point transect detection function with g(x)>1 #135

Closed lenthomas closed 1 year ago

lenthomas commented 1 year ago

Analysis of the camera trapping example with a half-normal detection function, one cosine adjustment, produces a fitted detection function with g(x)>1 for 0 > x > 3.5 or so:

DuikerCameraTraps <- read.csv(file="https://datadryad.org/stash/downloads/file_stream/73221", 
                              header=TRUE, sep="\t")

library(Distance)
conversion <- convert_units("meter", NULL, "square kilometer")
trunc.list <- list(left=2, right=15)
mybreaks <- c(seq(2, 8, 1), 10, 12, 15)

hn1 <- ds(DuikerCameraTraps, transect = "point", key="hn", adjustment = "cos",
          nadj=1,
          cutpoints = mybreaks, truncation = trunc.list)
summary(hn1)
plot(hn1, showpoints = FALSE)
Screenshot 2022-10-09 215414 Screenshot 2022-10-09 215520

This behaviour is in contrast with DistWin for the same analysis (DuikerDaytime sample project, analysis 8):

Screenshot 2022-10-09 215644 Screenshot 2022-10-09 215712 Screenshot 2022-10-09 215735

AIC is lower for the R analysis, but having a fitted detection function with g(x)>1 doesn't make sense. I don't know if DistWin checks for this at the constraint points, and so avoids it, but it would be good to find some way to avoid this behaviour in Distance`mrds`.

dill commented 1 year ago

See this reply but it seems that taking into account the left trunction is actually bad in this situation, so I've reverted to using the interval 0 to width as that's consistent with Distance for Windows.

Fixed in mrds 9445ce4.

lenthomas commented 1 year ago

This means the detection function analysis now coincides with what DistWin does, but that the abundance doesn't. There are fundamentally two different ways to handle left-truncated data - see http://lenthomas.org/papers/MarquesCREEMTR2012.pdf for an explanation. In the terms of that paper, you are now using p2 for the average detection function (which is what DistWin does), but a1 for the sampled area - which is inconsistent. I can see this because if I calculate

10284/((pi*15^2-pi*2^2)*31483179*0.3254428)*40370000

I get 58.36136, which is what you get, when DistWin gets 57.32 (well, actually it rounds, but you get the idea), from

10284/(pi*15^2*31483179*0.3254428)*40370000

I noticed too that the mrds cv.ER of 0.2746196 is different from what DistWin gets of "26.83" -- this may possibly be the same issue (or may be a difference in encounter rate variance).

So, can you switch the dht (and I assume dht2) code to using a2 for the area surveyed under left truncation (assuming that all times average p is calculated across the Distance and mrds packages it uses p2 and not p2)?

erex commented 1 year ago

This discussion about left truncation had a strong sense of deja vu. We covered this ground back in 2020 with issue #64; how did it reappear?

dill commented 1 year ago

Sorry I think we might be talking at cross purposes here. When I say "taking into account left truncation" I just mean in the interval used to bound the grid used for the monotonicity constraints. Not any part of the likelihood calculation.

lenthomas commented 1 year ago

10284/((pi15^2-pi2^2)314831790.3254428)*40370000

In discussing this with @dill today, I realized I used the wrong p here: I should have used the value fo 0.332966 that ds produced. In that case we get

> 10284/((pi*15^2-pi*2^2)*31483179*0.332966)*40370000
[1] 57.04272

which is very close to the 57.32 that DistWin gets.

So, in summary, both the p and area surveyed are calculated differently under left truncation by DistWin and ds/ddf/dht.

DistWin calculates p as the average detection probability between 0 and w, including the fact that detection probability between 0 and wl is 0. The area surveyed is then the area of the circle between 0 and w times the number of points.

ds etc calculates p as the avaerage detection probability between wl and w. The area surveyed is then the area of the annulus betwen wl and w.

After discussion, I agree that taking monotonicity constraints between 0 and w rather than between wl and w is fine.

lenthomas commented 1 year ago

I noticed too that the mrds cv.ER of 0.2746196 is different from what DistWin gets of "26.83" -- this may possibly be the same issue (or may be a difference in encounter rate variance).

Last thing: I still don't know why the ev.ER is different between the two. Any thoughts here?

lenthomas commented 1 year ago

Hmm... sorry @dill, but I double checked the estimation of D under left and right truncation, and I am re-convinced that either the p is calculated between 0 and w or the way dht2 is calculating density/abundance is wrong. Here is example code illustrating the issue. I have not checked what dht does, nor have I checked how the p is calculated.

library(Distance)
DuikerCameraTraps <- read.csv(file="https://datadryad.org/stash/downloads/file_stream/73221", 
                              header=TRUE, sep="\t")
DuikerCameraTraps$Area <- DuikerCameraTraps$Area / (1000*1000)
trunc.list <- list(left=2, right=15)
mybreaks <- c(seq(2, 8, 1), 10, 12, 15)
conversion <- convert_units("meter", NULL, "square kilometer")

#Fit detection function model
uni3 <- ds(DuikerCameraTraps, transect = "point", key="unif", adjustment = "cos",
           nadj=3,
           cutpoints = mybreaks, truncation = trunc.list, 
           initial_values = NULL, debug_level = 0)

#Calculate density
dh2.uni3 <- dht2(ddf=uni3, flatfile=DuikerCameraTraps, strat_formula=~1,
                     convert_units=conversion)
print(dh2.uni3, report="density")
#Gives density of 1.4053

#Manual calculation of density
p<-uni3$ddf$fitted[1]

#Correct calculation if the p is average detection function between right and left truncation points
with(dh2.uni3, n / (Effort * pi * (trunc.list$right^2 - trunc.list$left^2) * p)) * 1000 * 1000
#Gives 1.430783 - not what dht2 gets

#Incorrect calculation
with(dh2.uni3, n / (Effort * pi * trunc.list$right^2 * p)) * 1000 * 1000
#Gives 1.405347
dill commented 1 year ago

On the ER CV difference, Distance for Windows uses the P2 estimator by default but mrds uses P3. Specifying er_var="P2" yields an ER CV of 0.268276.

lenthomas commented 1 year ago

@dill, looks like you found and fixed a bug that was leading to incorrect density estimates? I tried to test it using previous code, but I'm afraid I'm getting an error message:

library(Distance)
DuikerCameraTraps <- read.csv(file="https://datadryad.org/stash/downloads/file_stream/73221", 
                              header=TRUE, sep="\t")
DuikerCameraTraps$Area <- DuikerCameraTraps$Area / (1000*1000)
trunc.list <- list(left=2, right=15)
mybreaks <- c(seq(2, 8, 1), 10, 12, 15)
conversion <- convert_units("meter", NULL, "square kilometer")

#Fit detection function model
uni3 <- ds(DuikerCameraTraps, transect = "point", key="unif", adjustment = "cos",
           nadj=3,
           cutpoints = mybreaks, truncation = trunc.list, 
           initial_values = NULL, debug_level = 0)

#Calculate density
dh2.uni3 <- dht2(ddf=uni3, flatfile=DuikerCameraTraps, strat_formula=~1,
                     convert_units=conversion)

gives

Error in dht2(ddf = uni3, flatfile = DuikerCameraTraps, strat_formula = ~1,  : 
  Column(s): object not in `flatfile`

This takes a while to run, but a simpler model gives the same error - e.g.,

hn <- ds(DuikerCameraTraps, transect = "point", key="hn", adjustment = NULL,
          cutpoints = mybreaks, truncation = trunc.list)
summary(hn)
dh2.hn <- dht2(ddf=hn, flatfile=DuikerCameraTraps, strat_formula=~1,
                 convert_units=conversion)
lenthomas commented 1 year ago

On the ER CV difference, Distance for Windows uses the P2 estimator by default but mrds uses P3. Specifying er_var="P2" yields an ER CV of 0.268276.

Thanks - I'll test this (and the density estimate) once the above problem is resolved.

dill commented 1 year ago

dht2 now requires that object be a column in the data (as a byproduct of the multiddf work for Environment Canada), see NEWS.

lenthomas commented 1 year ago
  1. This is going to break the code of everyone currently using dht2 with a flatfile. Isn't it possible to create an object column in cases where it is not passed in, numbering the objects 1:n?
  2. (a) If the above is not possible, and given that it will break everyone's code, is it possible to add a more informative error message? - for example pointing users at the help for dht2 (b) I notice that the help for flatfile does not specify that object is needed -- I guess this is because it's not needed for dht but it now is for dht2. This makes me wonder again about 1, above.
dill commented 1 year ago

Basically this is because for multiddf we need to match the individual detection function detections to parts of the flatfile. object is the only unique identifier for that, so we need to specify it in the data. We can't guarantee the ordering of the flatfile wrt the data contained in the ddf object, so the object column is necessary to do this.

I've tried to clear-up the error message but if you have a suggestion please let me know.

lenthomas commented 1 year ago

The vast majority of users will not be using multiddf, so is it possible to require the object field for those that are using it, but not for those who are not?

lenthomas commented 1 year ago

10284/((pi_15^2-pi_2^2)_31483179_0.3254428)*40370000

In discussing this with @dill today, I realized I used the wrong p here: I should have used the value fo 0.332966 that ds produced. In that case we get

> 10284/((pi*15^2-pi*2^2)*31483179*0.332966)*40370000
[1] 57.04272

which is very close to the 57.32 that DistWin gets.

So, in summary, both the p and area surveyed are calculated differently under left truncation by DistWin and ds/ddf/dht.

DistWin calculates p as the average detection probability between 0 and w, including the fact that detection probability between 0 and wl is 0. The area surveyed is then the area of the circle between 0 and w times the number of points.

ds etc calculates p as the avaerage detection probability between wl and w. The area surveyed is then the area of the annulus betwen wl and w.

After discussion, I agree that taking monotonicity constraints between 0 and w rather than between wl and w is fine.

We will document the difference in the way left truncation is dealt with in an mrds help page - see mrds issue 64. Will link to this from DistWin help and also from ds, dht and ddf.

lenthomas commented 1 year ago

We agreed to switch to P2 as the default point transect variance estimator in dht2 and dht - see #138 and mrds issue 65.

lenthomas commented 1 year ago

The vast majority of users will not be using multiddf, so is it possible to require the object field for those that are using it, but not for those who are not?

After discussion, we agreed to keep the object field in dht2's requirement for a flatfile - see #139.