Closed erex closed 4 years ago
I've looked into this, and it seems to be a bug in the area_calc function within the dht2 function (around line 380 of current code). The problem is that with left truncation for point transects, the area sampled is given by
effort * pi * ((w^2 - wl^2) * sample_fraction
where w
is the right truncation distance and wl
is the left truncation distance, but what is being calculated is effectively
effort * pi * ((w - wl)^2 * sample_fraction
(this can be seen by looking at line 376). I recommend the area_calc function is changed so that it accepts both the left and right truncation distances, rather than just the strip half-width, and then it is changed to something like
area_calc <- function(w, wl, effort, transect_type, sample_fraction){
if(transect_type == "point"){
return(effort * pi * (w^2 - wl^2) * sample_fraction)
}else{
return(effort * 2 * (w-wl) * sample_fraction)
}
}
Here is some code that I used after running Eric's code snippet, above, that led me down this path. (The cryptic comment about checking p_a is calculated correctly is explained below - it's an esoteric point, so eel free to ignore!)
#Try to estimate by hand, using equations 1 or 2 from the Howe paper.
# Note that formula ignores left truncation, so I need to check the
# p_a is calculated correctly so that left truncation can be ignored
# (can do that later, if I don't get to the bottom of this sooner).
library(dplyr)
wl <- trunc.list$left
peak_sample <- peak %>%
group_by(Sample.Label) %>%
summarize(Effort = first(Effort), multiplier = first(multiplier),
nk = sum((distance > wl) & (distance <= w), na.rm = TRUE),
.groups = "drop")
peak_sample$ek <- peak_sample$Effort * peak_sample$multiplier
peak_stratum <- peak %>%
group_by(Region.Label) %>%
summarize(Area = first(Area), .groups = "drop")
Dhat <- sum(peak_sample$nk) / (pi * w^2 * sum(peak_sample$ek * p_a))
#Convert units to number per km2
Dhat <- Dhat * 1000^2
Dhat #I get 14.51055 (paper gets 14.5)
#Check n, effort and number of samples
sum(peak_sample$nk) #5865 - same as peak.hr.dens
sum(peak_sample$Effort) #12317058 - same as peak.hr.dens
nrow(peak_sample) #21 - same as peak.hr.dens
#Check covered area
pi * (w^2 - wl^2) * sum(peak_sample$ek) / 1000^2 # 997.691 -
# but peak.hr.dens gets 762.9399
# Bug in the dht2 code -- it is computing
pi * (w - wl)^2 * sum(peak_sample$ek) / 1000^2
# on line 381 of dht2 function, in area_calc function
# (similar computation works for line transects, but not for points)
(Aside on "correct" calculation of p_a. With left tuncation, there are three ways to define p_a:
dht2
here -- it seems to be the problem I pointed out above.)(Addition to the aside -- Tiago and I actually came across this issue before, because we found a difference in the way we had calculated p_a (which was 3. above) and the way Distance for Windows does it (method 2. above). We needed to write a clarification to a previous paper -- so if you want full details, see http://lenthomas.org/papers/MarquesCREEMTR2012.pdf. )
Also, in case it's any use, I made a Distance for Windows project using the above data, to check the variance calculations. (It seems to check out.) I attach it below. CameraTrap.zip
I've made the correction you suggest above @lenthomas. I now get a result of 14.7732 for density, using the block marked multiplier
in @erex's code. If the paper reports 14.5, is this "close enough the the difference is between implementation of P_a calculation methods 2 and 3 above?
Err.. the correct answer is 14.510, so it seems the fix I suggested is not completely correct. I am sure the answer is in the fact this is point transect left truncation. Do you want me to look into this more or will you @dill?
I am swamped today so if you can look that would be great.
Could this just be a difference in methods 2 and 3 that you list above?
Yes, you were right. Distance for Windows implements method 2, above, which means that left truncation can then be ignored when calculating the covered area. So, it turned out that the correct fix was to replace
df_width <- (ddf$ds$aux$width - ddf$ds$aux$left)*convert_units
with
df_width <- ddf$ds$aux$width*convert_units
and leave area_calc
alone. I made this change, and also put a comment in area_calc
to explain why left truncation can be ignored. I pushed the fix to the 1.0.1-candidate
branch - hope I did the right thing there! (I note it says I'm "anonymous" so I clearly have some configuration wrong.)
I suggest we add this analysis as one of our test projects in test_that
- we may need to get permission from Eric Howe if the data are not public already.
Public availability of Eric's data shouldn't be a problem https://datadryad.org/stash/data_paper/doi:10.5061/dryad.b4c70?
Thanks @lenthomas.
Agree that it would be good to have this as a test. @erex, is your plan to have your above vignette available on the examples site and if so would it be worth bundling the duiker data so that it's in a convenient format for that analysis (as well as for tests) and ship it with Distance
as we do with other example datasets? If so, is the appropriate format that which is detailed in the vignette you posted?
Yes, the posting to the list that started this issue six weeks ago requested a vignette.
I was wondering if there were any plans to produce vignettes or training courses on this new method, or if a R Markdown file for the duiker analysis described in Howe et al was available?
Obvious place to put the vignette would be in https://examples.distancesampling.org. I had written the vignette also available at the beginning of this issue
It would be lovely from the vignette perspective if the data set could be read directly by read.delim()
from Dryad, but I remember that being difficult for some reason.
Seems the format works fine.
Note the comment I made near the end of the vignette regarding some incompatibility performing bootstraps in the presence of multipliers. Guess that forms the basis of a new issue.
Agree that it would be neat to read from Dryad for the vignette, but that wouldn't be ideal for the test (of course I can include the dataset in Distance
and you can ignore it and load the data from Dryad anyway but I'm going to include it, it may as well be in a useful format).
I suggest that I add the data and test for the 1.0.1 release and close this issue. Also suggest we punt the bootstrap issue (which if you can raise that in a new issue would be great) to 1.0.2 given time constraints (or poush 1.0.1 release date back).
I've added an outline script to convert this data to the dsdata
conversion scripts. I've also added a rudimentary manual page here which might benefit from additional information (currently scraped info from the Dryad page, but knowing almost nothing about camera trapping I may be missing important details).
(I'll merge this data into Distance
once you confirm it's in reasonable order.)
Thanks Dave. Give me a day or two to look through the manual page and perhaps make an addition or two.
Regarding the bootstrap, .@dill I see that adding in multipliers with uncertainty is an issue that should be reserved for a future release, as one then needs to sample from some distribution for the multiplier. However, the case where there is no uncertainty should be straightforward. In this current case, one way (and I think the better way) to enter the fact that the camera has a limited field of view is through the sample_fraction argument to dht2. Is this currently passed on into the bootstrapping code? If so, then it all works out of the box, so long as one goes that route rather than via a multiplier. What do you think? I can check it out if you're not sure.
I've added some code to include sample_fraction
as an argument to bootdht
but I'm wondering if that is the best term for that in this context. Perhaps we can have both multipliers
and sample_fraction
as in dht2
but for now when the $se
element of the multipliers is not zero we throw an error?
(Not that code has not been committed to git yet)
I think that sample_fraction is a fine term in this case because it refers to the proportion of the circle that is in the field of view of the camera, which is exactly the kind of instance that sample_fraction was intended for. (One-sided transects, cue counts where only a portion of the circle is in view, etc.) But I'm also happy if you want to include multipliers as well and throw an error in the bootstrap if the $se
is not zero, as you suggest. Thanks! :)
Okay I'll leave this as sample_fraction
for this release and flag an issue for multipliers
for 1.0.2.
This is now easily test-able as I've merged everything into the master branch.
Note that running unit tests on this dataset is problematic as there are so many observations. Will look into fitting to a subset of the data.
@erex / @lenthomas are you satisfied here and can I close this issue?
I confirm that the analysis gives (within tolerance) the same density and SE as Distance for Windows (the latter gives 14.510 and 4.3836, respectively). I have not tested the bootstrap. I see that the duiker example dataset in dsdata has a column "multiplier" for the sample_fraction, but I suppose that is becuase this is the way it was given in the Dryad, so okay by me. I expect the vignette will clarify how to get the same result as the paper (since I think at present it has to be treated as a sample_fraction and not a multiplier). So, I'm satisfied so long as you're confident the bootstrap works.
sorry,haven't been keeping an eye upon this issue. will have a look tomorrow
I've re-run the vignette I wrote back in May, duplicating Howe's analysis published in MEE (reading data from file rather than using in-built data set). I concur with Len, Distance 1.0.1.9002 produces point and interval estimates that mimic those in the MEE paper for peak activity data set.
This analysis does include a short bootstrap (49 reps, elapsed time ~2 hours) without a multiplier, accomplished by altering the original effort data. Median of that small number of bootstraps (21) is far from point estimate of density (14.5), but that's likely because of the huge encounter rate variance.
I've not tried to produce the naive bootstrap with the constant multiplier.
Thanks, I think there is a separate issue to deal with the speed of bootstrapping but perhaps we can close this for now and come back to that speed issue as part of optimization improvements?
On 24/07/2020 10:45, erex wrote:
I've re-run the vignette I wrote back in May, duplicating Howe's analysis published in MEE (reading data from file rather than using in-built data set). I concur with Len, Distance 1.0.1.9002 produces point and interval estimates that mimic those in the MEE paper for peak activity data set.
This analysis does include a short bootstrap (49 reps, elapsed time ~2 hours) without a multiplier, accomplished by altering the original effort data. Median of that small number of bootstraps (21) is far from point estimate of density (14.5), but that's likely because of the huge encounter rate variance.
I've not tried to produce the naive bootstrap with the constant multiplier.
howe-vignette-Distance1.0.1.9002.pdf https://github.com/DistanceDevelopment/Distance/files/4971216/howe-vignette-Distance1.0.1.9002.pdf
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/DistanceDevelopment/Distance/issues/64#issuecomment-663455557, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAAAPIOGUSPJTUXWUZ7NKSTR5FJ2JANCNFSM4NPST2DA.
just a minute; I'm checking whether bootdht
can handle sample_fraction
in a fitted model object.
okay, no problem.
On 24/07/2020 10:51, erex wrote:
just a minute; I'm checking whether |bootdht| can handle |sample_fraction| in a fitted model object.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/DistanceDevelopment/Distance/issues/64#issuecomment-663460198, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAAAPIMRB2WTKPLSJEM23FLR5FKSVANCNFSM4NPST2DA.
the answer to sample_fraction
appears to be "No"
> boot.samfrac <- bootdht(model=peak.hr.dens, flatfile=peak, resample_transects = TRUE, nboot=5, summary_fun=mysummary)
Error in bootdht(model = peak.hr.dens, flatfile = peak, resample_transects = TRUE, :
Only models fitted by Distance::ds() may be used
I presume I'll get the same answer when using multiplier
as I thought
> boot.samfrac <- bootdht(model=peak.hr.dens.mult, flatfile=peak, resample_transects = TRUE, nboot=5, summary_fun=mysummary)
Error in bootdht(model = peak.hr.dens.mult, flatfile = peak, resample_transects = TRUE, :
Only models fitted by Distance::ds() may be used
suppose current bootdht
documentation might need to mention these limitations
Sorry I've lost track of what different model objects are here... is peak.hr.dens
and peak.hr.dens.mult
as in your initial post and are dht2()
return objects? Or are they returns from ds()
?
sorry; peak.hr.dens
and peak.hr.dens.mult
are indeed coming out of dht2
in that case the error is correct, as model=
needs to be a fitted model from ds()
(or ddf()
). You need to supply the sample_fraction=
argument to bootdht()
, not at the modelling stage.
oops, let me try again
This looks promising for sample_fraction
argument in bootdht
> samfrac <- viewangle / 360
> test.samfra <- bootdht(model=peak.hr, flatfile=peak, resample_transects = TRUE, sample_fraction = samfrac, nboot = 5, summary_fun = mysummary)
Performing 5 bootstraps
|====================================================================== | 80%Error :
gosolnp-->Could not find a feasible starting point...exiting
|=======================================================================================| 100%
Warning message:
In dht(model, region.table, sample.table, obs.table, options = list(group = dht.group, :
Point transect encounter rate variance can only use estimators P2 or P3, switching to P3.
> summary(test.samfra)
Bootstrap results
Boostraps : 5
Successes : 5
Failures : 0
median mean se lcl ucl cv
Dhat 16.02 22.23 10 13.89 35.1 0.62
Seeing you've opened #70, can this issue be closed?
On 24/07/2020 11:32, erex wrote:
This looks promising for |sample_fraction| argument in |bootdht|
|> samfrac <- viewangle / 360 > test.samfra <- bootdht(model=peak.hr, flatfile=peak, resample_transects = TRUE, sample_fraction = samfrac, nboot = 5, summary_fun = mysummary) Performing 5 bootstraps |====================================================================== | 80%Error : gosolnp-->Could not find a feasible starting point...exiting |=======================================================================================| 100% Warning message: In dht(model, region.table, sample.table, obs.table, options = list(group = dht.group, : Point transect encounter rate variance can only use estimators P2 or P3, switching to P3. > summary(test.samfra) Bootstrap results Boostraps : 5 Successes : 5 Failures : 0 median mean se lcl ucl cv Dhat 16.02 22.23 10 13.89 35.1 0.62 |
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/DistanceDevelopment/Distance/issues/64#issuecomment-663475028, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAAAPIJGY5GFZ6GPXW75C43R5FPMRANCNFSM4NPST2DA.
yes, happy for yo to close #64
There have been two posts to the Distance sampling list requesting either a vignette or training in this subject. I built such a vignette based upon Howe et al.'s 2017 duiker data.
howe-vignette.pdf
I've not shared the analysis with the Distance community because the density estimate produced does not match the estimate published in the paper (19.3 in R and 14.5 in the paper) MEE 2017 distance sampling cameras.pdf
I've not been able to track down the source of the discrepency. Eric shared with me his DistWin project that produced the estimate. I have analysed in R both the data set from Dryad as well as the data extracted from the DistWin project. The Dryad data and the DistWin project data produce identical results in R of 19.3.
I do know where the discrepency does not occur. There is no difference in AIC, hazard rate parameter estimates, P_a or EDR between DistWin and R, so the difference in results is not the result of optimisation issues.
Reproducible example
Data file as text
PeakDistances.txt
Not an urgent matter
The user community is curious, but not rabid about this subject. I do have a participant coming to the July workshop who is using camera trapping, so I would like to add this as a supplemental component to Practical 9 that deals with multipliers.