Variance components with mrds

erex commented 2 years ago

Similar to mrds issue 52, another user on the list wanted to compute variance components from an mrds analysis.

A simple example analysis of the golf tee data as follows:

library(mrds)
data(book.tee.data)
region <- book.tee.data$book.tee.region
egdata <- book.tee.data$book.tee.dataframe
samples <- book.tee.data$book.tee.samples
obs <- book.tee.data$book.tee.obs

# fit an independent observer model with full independence
result.io.fi <- ddf(mrmodel=~glm(~distance), data=egdata, method="io.fi",
                    meta.data=list(width = 4))

out <- dht(result.io.fi, region, samples, obs)

cv.er <- out$individuals$summary[3,"cv.ER"]
cv.p <- summary(result.io.fi)$average.p.se / summary(result.io.fi)$average.p
cv.N <- out$individuals$N[3, "cv"]
prop.er <- cv.er^2 / cv.N^2 * 100
prop.p <- as.numeric(cv.p)^2 / cv.N^2 * 100

print(paste(round(prop.p,2), round(prop.er,2)))
[1] "22.92 77.08"
> sum(c(prop.er, prop.p))
[1] 100

This seems reasonable; however there are two points:

the golf tees apparently occur in groups, so the abundance of individuals should have variability associated with expected group size, but I don't see that uncertainty incorporated into the estimate of individual abundance. If uncertainty in group size existed in indivdual abundance estimate, then the sum of encounter rate and detection function should not sum to 100.
```
Abundance and density estimates from distance sampling
Variance       : R2, N/L 
```

Summary statistics

Region Area CoveredArea Effort n k ER se.ER cv.ER 1 1 1040 1040 130 88 6 0.6769231 0.05388253 0.07959919 2 2 640 640 80 74 5 0.9250000 0.09254697 0.10005078 3 Total 1680 1680 210 162 11 0.7714286 0.04853450 0.06291509

Summary for clusters

Abundance: Label Estimate se cv lcl ucl df 1 1 101.08850 8.990598 0.08893789 82.29298 124.1769 7.777609 2 2 85.00624 9.149165 0.10762934 64.86058 111.4091 5.353435 3 Total 186.09473 13.841528 0.07437894 159.09696 217.6739 16.953769

Density: Label Estimate se cv lcl ucl df 1 1 0.09720048 0.008644806 0.08893789 0.07912786 0.1194008 7.777609 2 2 0.13282224 0.014295571 0.10762934 0.10134465 0.1740768 5.353435 3 Total 0.11077067 0.008239005 0.07437894 0.09470057 0.1295678 16.953769

Summary for individuals

Abundance: Label Estimate se cv lcl ucl df 1 1 317.0503 17.32922 0.05465765 283.0544 355.1292 21.50050 2 2 256.1674 41.23037 0.16095087 167.6183 391.4952 4.53376 3 Total 573.2177 47.49638 0.08285924 473.5028 693.9316 7.91414

Density: Label Estimate se cv lcl ucl df 1 1 0.3048560 0.01666272 0.05465765 0.2721677 0.3414704 21.50050 2 2 0.4002616 0.06442246 0.16095087 0.2619036 0.6117112 4.53376 3 Total 0.3412010 0.02827166 0.08285924 0.2818469 0.4130545 7.91414

Expected cluster size Region Expected.S se.Expected.S cv.Expected.S 1 1 3.136364 0.1840659 0.05868767 2 2 3.013514 0.2102132 0.06975686 3 Total 3.080247 0.1356528 0.04403958

- secondly, two ways to calculate encounter rate variance can be specified with an argument to `dht()` either Buckland et al. (2001) or Innes et al. (2002):

out2 <- dht(result.io.fi, region, samples, obs, options=list(varflag=1)) out2 Abundance and density estimates from distance sampling Variance : R2, n/L

Summary statistics

Region Area CoveredArea Effort n k ER se.ER cv.ER 1 1 1040 1040 130 88 6 0.6769231 0.05388253 0.07959919 2 2 640 640 80 74 5 0.9250000 0.09254697 0.10005078 3 Total 1680 1680 210 162 11 0.7714286 0.04853450 0.06291509

Summary for clusters

Abundance: Label Estimate se cv lcl ucl df 1 1 101.08850 8.990598 0.08893789 82.29298 124.1769 7.777609 2 2 85.00624 9.149165 0.10762934 64.86058 111.4091 5.353435 3 Total 186.09473 13.841528 0.07437894 159.09696 217.6739 16.953769

Density: Label Estimate se cv lcl ucl df 1 1 0.09720048 0.008644806 0.08893789 0.07912786 0.1194008 7.777609 2 2 0.13282224 0.014295571 0.10762934 0.10134465 0.1740768 5.353435 3 Total 0.11077067 0.008239005 0.07437894 0.09470057 0.1295678 16.953769

Summary for individuals

Abundance: Label Estimate se cv lcl ucl df 1 1 317.0503 33.92116 0.10698985 252.8760 397.5105 16.002029 2 2 256.1674 31.53158 0.12308972 194.1962 337.9148 9.094684 3 Total 573.2177 48.99537 0.08547428 481.5418 682.3469 29.883829

Density: Label Estimate se cv lcl ucl df 1 1 0.3048560 0.03261650 0.10698985 0.2431500 0.3822216 16.002029 2 2 0.4002616 0.04926809 0.12308972 0.3034315 0.5279919 9.094684 3 Total 0.3412010 0.02916391 0.08547428 0.2866320 0.4061589 29.883829

Expected cluster size Region Expected.S 1 1 3.136364 2 2 3.013514 3 Total 3.080247


Of note, the encounter rate standard error (and CV) reported by both calls (regardless of the `varflag` output) are identical.  The change is apparent in the SE of abundance/density of indivduals (not groups) and the uncertainty in expected cluster size disappears.

Perhaps these results are as expected, but I don't know enough to know if these results are as they should be.

lenthomas commented 2 years ago

We expect a difference in variances for abundance and density at individual level between the Buckland and Innes estimators. Also, the encounter rate variance is reported for information, and calculated the same way in both. We'll look into the details of variance estimation as part of documenting differences between DistWin and mrds.

One thing we noticed is that variance of expected cluster size is not reported in the Buckland et al estimator, but we notice from the code it is used (variance of mean cluster size) so will consider adding it.

In addition, one option to consider is to add a variance components calculation into dht.

dill commented 1 year ago

See PR here

DistanceDevelopment / mrds

Variance components with mrds #63