Closed lenthomas closed 1 year ago
See this reply but it seems that taking into account the left trunction is actually bad in this situation, so I've reverted to using the interval 0 to width
as that's consistent with Distance for Windows.
Fixed in mrds
9445ce4.
This means the detection function analysis now coincides with what DistWin does, but that the abundance doesn't. There are fundamentally two different ways to handle left-truncated data - see http://lenthomas.org/papers/MarquesCREEMTR2012.pdf for an explanation. In the terms of that paper, you are now using p2 for the average detection function (which is what DistWin does), but a1 for the sampled area - which is inconsistent. I can see this because if I calculate
10284/((pi*15^2-pi*2^2)*31483179*0.3254428)*40370000
I get 58.36136, which is what you get, when DistWin gets 57.32 (well, actually it rounds, but you get the idea), from
10284/(pi*15^2*31483179*0.3254428)*40370000
I noticed too that the mrds
cv.ER of 0.2746196 is different from what DistWin gets of "26.83" -- this may possibly be the same issue (or may be a difference in encounter rate variance).
So, can you switch the dht (and I assume dht2) code to using a2 for the area surveyed under left truncation (assuming that all times average p is calculated across the Distance and mrds packages it uses p2 and not p2)?
This discussion about left truncation had a strong sense of deja vu. We covered this ground back in 2020 with issue #64; how did it reappear?
Sorry I think we might be talking at cross purposes here. When I say "taking into account left truncation" I just mean in the interval used to bound the grid used for the monotonicity constraints. Not any part of the likelihood calculation.
10284/((pi15^2-pi2^2)314831790.3254428)*40370000
In discussing this with @dill today, I realized I used the wrong p
here: I should have used the value fo 0.332966
that ds
produced. In that case we get
> 10284/((pi*15^2-pi*2^2)*31483179*0.332966)*40370000
[1] 57.04272
which is very close to the 57.32 that DistWin
gets.
So, in summary, both the p
and area surveyed are calculated differently under left truncation by DistWin
and ds
/ddf
/dht
.
DistWin
calculates p as the average detection probability between 0 and w, including the fact that detection probability between 0 and wl is 0. The area surveyed is then the area of the circle between 0 and w times the number of points.
ds
etc calculates p as the avaerage detection probability between wl and w. The area surveyed is then the area of the annulus betwen wl and w.
After discussion, I agree that taking monotonicity constraints between 0 and w rather than between wl and w is fine.
I noticed too that the
mrds
cv.ER of 0.2746196 is different from what DistWin gets of "26.83" -- this may possibly be the same issue (or may be a difference in encounter rate variance).
Last thing: I still don't know why the ev.ER is different between the two. Any thoughts here?
Hmm... sorry @dill, but I double checked the estimation of D under left and right truncation, and I am re-convinced that either the p
is calculated between 0 and w or the way dht2
is calculating density/abundance is wrong. Here is example code illustrating the issue. I have not checked what dht
does, nor have I checked how the p
is calculated.
library(Distance)
DuikerCameraTraps <- read.csv(file="https://datadryad.org/stash/downloads/file_stream/73221",
header=TRUE, sep="\t")
DuikerCameraTraps$Area <- DuikerCameraTraps$Area / (1000*1000)
trunc.list <- list(left=2, right=15)
mybreaks <- c(seq(2, 8, 1), 10, 12, 15)
conversion <- convert_units("meter", NULL, "square kilometer")
#Fit detection function model
uni3 <- ds(DuikerCameraTraps, transect = "point", key="unif", adjustment = "cos",
nadj=3,
cutpoints = mybreaks, truncation = trunc.list,
initial_values = NULL, debug_level = 0)
#Calculate density
dh2.uni3 <- dht2(ddf=uni3, flatfile=DuikerCameraTraps, strat_formula=~1,
convert_units=conversion)
print(dh2.uni3, report="density")
#Gives density of 1.4053
#Manual calculation of density
p<-uni3$ddf$fitted[1]
#Correct calculation if the p is average detection function between right and left truncation points
with(dh2.uni3, n / (Effort * pi * (trunc.list$right^2 - trunc.list$left^2) * p)) * 1000 * 1000
#Gives 1.430783 - not what dht2 gets
#Incorrect calculation
with(dh2.uni3, n / (Effort * pi * trunc.list$right^2 * p)) * 1000 * 1000
#Gives 1.405347
On the ER CV difference, Distance for Windows uses the P2 estimator by default but mrds
uses P3. Specifying
er_var="P2"
yields an ER CV of 0.268276.
@dill, looks like you found and fixed a bug that was leading to incorrect density estimates? I tried to test it using previous code, but I'm afraid I'm getting an error message:
library(Distance)
DuikerCameraTraps <- read.csv(file="https://datadryad.org/stash/downloads/file_stream/73221",
header=TRUE, sep="\t")
DuikerCameraTraps$Area <- DuikerCameraTraps$Area / (1000*1000)
trunc.list <- list(left=2, right=15)
mybreaks <- c(seq(2, 8, 1), 10, 12, 15)
conversion <- convert_units("meter", NULL, "square kilometer")
#Fit detection function model
uni3 <- ds(DuikerCameraTraps, transect = "point", key="unif", adjustment = "cos",
nadj=3,
cutpoints = mybreaks, truncation = trunc.list,
initial_values = NULL, debug_level = 0)
#Calculate density
dh2.uni3 <- dht2(ddf=uni3, flatfile=DuikerCameraTraps, strat_formula=~1,
convert_units=conversion)
gives
Error in dht2(ddf = uni3, flatfile = DuikerCameraTraps, strat_formula = ~1, :
Column(s): object not in `flatfile`
This takes a while to run, but a simpler model gives the same error - e.g.,
hn <- ds(DuikerCameraTraps, transect = "point", key="hn", adjustment = NULL,
cutpoints = mybreaks, truncation = trunc.list)
summary(hn)
dh2.hn <- dht2(ddf=hn, flatfile=DuikerCameraTraps, strat_formula=~1,
convert_units=conversion)
On the ER CV difference, Distance for Windows uses the P2 estimator by default but
mrds
uses P3. Specifyinger_var="P2"
yields an ER CV of 0.268276.
Thanks - I'll test this (and the density estimate) once the above problem is resolved.
dht2
now requires that object
be a column in the data (as a byproduct of the multiddf work for Environment Canada), see NEWS
.
dht2
with a flatfile. Isn't it possible to create an object
column in cases where it is not passed in, numbering the objects 1:n? Basically this is because for multiddf we need to match the individual detection function detections to parts of the flatfile. object
is the only unique identifier for that, so we need to specify it in the data. We can't guarantee the ordering of the flatfile wrt the data contained in the ddf
object, so the object
column is necessary to do this.
I've tried to clear-up the error message but if you have a suggestion please let me know.
The vast majority of users will not be using multiddf, so is it possible to require the object
field for those that are using it, but not for those who are not?
10284/((pi_15^2-pi_2^2)_31483179_0.3254428)*40370000
In discussing this with @dill today, I realized I used the wrong
p
here: I should have used the value fo0.332966
thatds
produced. In that case we get> 10284/((pi*15^2-pi*2^2)*31483179*0.332966)*40370000 [1] 57.04272
which is very close to the 57.32 that
DistWin
gets.So, in summary, both the
p
and area surveyed are calculated differently under left truncation byDistWin
andds
/ddf
/dht
.
DistWin
calculates p as the average detection probability between 0 and w, including the fact that detection probability between 0 and wl is 0. The area surveyed is then the area of the circle between 0 and w times the number of points.
ds
etc calculates p as the avaerage detection probability between wl and w. The area surveyed is then the area of the annulus betwen wl and w.After discussion, I agree that taking monotonicity constraints between 0 and w rather than between wl and w is fine.
We will document the difference in the way left truncation is dealt with in an mrds
help page - see mrds issue 64. Will link to this from DistWin help and also from ds
, dht
and ddf
.
We agreed to switch to P2 as the default point transect variance estimator in dht2
and dht
- see #138 and mrds issue 65.
The vast majority of users will not be using multiddf, so is it possible to require the
object
field for those that are using it, but not for those who are not?
After discussion, we agreed to keep the object
field in dht2
's requirement for a flatfile - see #139.
Analysis of the camera trapping example with a half-normal detection function, one cosine adjustment, produces a fitted detection function with g(x)>1 for 0 > x > 3.5 or so:
This behaviour is in contrast with DistWin for the same analysis (DuikerDaytime sample project, analysis 8):
AIC is lower for the R analysis, but having a fitted detection function with g(x)>1 doesn't make sense. I don't know if DistWin checks for this at the constraint points, and so avoids it, but it would be good to find some way to avoid this behaviour in
Distance
`mrds`.