DistanceDevelopment / mrds

R package for mark-recapture-distance-sampling analysis

GNU General Public License v3.0

4 stars 4 forks source link

using int.range in ddf() generates an error #16

Closed megancatonferguson closed 7 years ago

megancatonferguson commented 7 years ago

I am using R version 3.3.2 with mrds version 2.1.17. I am having trouble using int.range as a matrix in order to specify w separately for each observation. I made sure that the "distance" value for each observation was between the corresponding values of left and width in the int.range matrix.

Here is a sample of the code I used:

      int.range <- matrix(rep(0, (nrow(Bm.x95)*2)), nrow=nrow(Bm.x95))
      colnames(int.range) <- c("left", "width")

      idx <- which(Bm.x95$VisSOPkm == 0.5)
      int.range[idx,2] <- 1.25

      idx <- which(Bm.x95$VisSOPkm == 1.5)
      int.range[idx,2] <- 2.25

      idx <- which(Bm.x95$VisSOPkm == 2.5)
      int.range[idx,2] <- 3.25

      idx <- which(Bm.x95$VisSOPkm == 4)
      int.range[idx,2] <- 5.25

      idx <- which(Bm.x95$VisSOPkm == 7.5)
      int.range[idx,2] <- 10.25

      idx <- which(Bm.x95$VisSOPkm == 20)
      int.range[idx,2] <- 20

      Bmi.ddfdwdist.trnc5pct.hr.super.catsize  <-
            ddf(dsmodel = ~mcds(key = "hr", formula = ~super.catsize),
                data = Bm.x95, method = "ds", meta.data=list("int.range"=int.range),
                control=list("optimx.maxit"=10000, "refit"=TRUE, "nrefits"=40, debug=TRUE))

Here is a copy of the error that I received:

Warning in process.data(data, meta.data, check = FALSE) :

    #  no truncation distance specified; using largest observed distance
    #Error in detfct.fit.opt(ddfobj, optim.options, bounds, misc.options) : 
    #  No convergence.
    #In addition: There were 26 warnings (use warnings() to see them)
    #> warnings()
    #Warning messages:
    #1: In detfct(distance, ddfobj, select, index, width, standardize,  ... :
    #  longer object length is not a multiple of shorter object length

I tried increasing the limits in the control list, but that didn't seem to solve the problem.

I tried troubleshooting the problem myself, using getAnywhere() to work my way through all of the functions, but couldn't figure out what was causing the error. The error arises in ddf.ds at this command: lt <- detfct.fit(ddfobj, optim.options, bounds, misc.options)

In detfct.fit, the error arises at this command: lt <- detfct.fit.opt(ddfobj, optim.options, bounds, misc.options)

In flpt.lnl, the following command also seems relevant: p1 <- distpdf(x$distance[!x$binned], ddfobj = ddfobj, select = !x$binned, width = width, standardize = FALSE, point = misc.options$point, left = left)

In fx, here's where the warning about incompatible objects arises: return(detfct(distance, ddfobj, select, index, width, standardize, stdint)/(width - left))

I'm happy to share my data if that would help troubleshoot. Thanks!

megancatonferguson commented 7 years ago

Thank you! I'll give it a try and keep you posted.

On Wed, May 10, 2017 at 2:18 PM, DL Miller notifications@github.com wrote:

Fixed via f630aee.

Jason: please do let me know if that doesn't solve your issue!

On 08/05/2017 23:15, megancatonferguson wrote:

I am using R version 3.3.2 with mrds version 2.1.17. I am having trouble using int.range as a matrix in order to specify w separately for each observation. I made sure that the "distance" value for each observation was between the corresponding values of left and width in the int.range matrix.

Here is a sample of the code I used:

|int.range <- matrix(rep(0, (nrow(Bm.x95)*2)), nrow=nrow(Bm.x95)) colnames(int.range) <- c("left", "width") idx <- which(Bm.x95$VisSOPkm == 0.5) int.range[idx,2] <- 1.25 idx <- which(Bm.x95$VisSOPkm == 1.5) int.range[idx,2] <- 2.25 idx <- which(Bm.x95$VisSOPkm == 2.5) int.range[idx,2] <- 3.25 idx <- which(Bm.x95$VisSOPkm == 4) int.range[idx,2] <- 5.25 idx <- which(Bm.x95$VisSOPkm == 7.5) int.range[idx,2] <- 10.25 idx <- which(Bm.x95$VisSOPkm == 20) int.range[idx,2] <- 20 Bmi.ddfdwdist.trnc5pct.hr.super.catsize <- ddf(dsmodel = ~mcds(key = "hr", formula = ~super.catsize), data = Bm.x95, method = "ds", meta.data=list("int.range"=int.range), control=list("optimx.maxit"=10000, "refit"=TRUE, "nrefits"=40, debug=TRUE)) |

Here is a copy of the error that I received:

Warning in process.data(data, meta.data, check = FALSE) :

no truncation distance specified; using largest observed distance

Error in detfct.fit.opt(ddfobj, optim.options, bounds, misc.options) :

No convergence.

In addition: There were 26 warnings (use warnings() to see them)

> warnings()

Warning messages:

1 https://github.com/DistanceDevelopment/mrds/issues/1: In

detfct(distance, ddfobj, select, index, width, standardize, ... :

longer object length is not a multiple of shorter object length

I tried increasing the limits in the control list, but that didn't seem to solve the problem.

I tried troubleshooting the problem myself, using getAnywhere() to work my way through all of the functions, but couldn't figure out what was causing the error. The error arises in ddf.ds at this command: lt <- detfct.fit(ddfobj, optim.options, bounds, misc.options)

In detfct.fit, the error arises at this command: lt <- detfct.fit.opt(ddfobj, optim.options, bounds, misc.options)

In flpt.lnl, the following command also seems relevant: p1 <- distpdf(x$distance[!x$binned], ddfobj = ddfobj, select = !x$binned, width = width, standardize = FALSE, point = misc.options$point, left = left)

In fx, here's where the warning about incompatible objects arises: return(detfct(distance, ddfobj, select, index, width, standardize, stdint)/(width - left))

I'm happy to share my data if that would help troubleshoot. Thanks!

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/DistanceDevelopment/mrds/issues/16, or mute the thread https://github.com/notifications/unsubscribe-auth/AAAHoSLFYw_ 5t8Zn4QbGfHAJhy8cF2mmks5r34XlgaJpZM4NUggZ.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/DistanceDevelopment/mrds/issues/16#issuecomment-300615647, or mute the thread https://github.com/notifications/unsubscribe-auth/AafxNbF3p0mQjmc3249wF8ES7Qq2tMjrks5r4inAgaJpZM4NUggZ .

-- Megan C. Ferguson Cetacean Assessment and Ecology Program Marine Mammal Laboratory Alaska Fisheries Science Center National Marine Fisheries Service NOAA 7600 Sand Point Way NE Seattle, WA 98115-6349 USA Tel: 1.206.526.6274 Fax: 1.206.526.6615 Megan.Ferguson@noaa.gov

dill commented 7 years ago

Please ignore that previous comment, it was meant for another thread! Onto this bug now...

dill commented 7 years ago

Megan, can you try out using the fix in the intrange-fix branch. You can install that version using the following code:

library(devtools)
install_github("DistanceDevelopment/mrds", ref="intrange-fix")

Not 100% sure that the results will be reasonable, so please let me know if they are not!

megancatonferguson commented 7 years ago

I should have time this afternoon to give it a try. If not today, then tomorrow. Thanks!

On Thu, May 11, 2017 at 7:14 AM, DL Miller notifications@github.com wrote:

Megan, can you try out using the fix in the intrange-fix branch. You can install that version using the following code:

library(devtools) install_github("DistanceDevelopment/mrds", ref="intrange-fix")

Not 100% sure that the results will be reasonable, so please let me know if they are not!

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/DistanceDevelopment/mrds/issues/16#issuecomment-300802475, or mute the thread https://github.com/notifications/unsubscribe-auth/AafxNS1JqGRN5Rte-PmOh-V1WN5xZdfeks5r4xfcgaJpZM4NUggZ .

megancatonferguson commented 7 years ago

Hi Dave,

The int.range fix worked fine and the results seem reasonable. The only glitch I ran into was that it wouldn't plot the resulting detection function...but I'm not sure that I should really expect it to be able to plot a detection function with variable integration parameters. Here's the plotting error that I got:

Error in int.range[selected, ] : (subscript) logical subscript too long In addition: Warning message: In plot.ds(Bmi.dx.trnc5pct.hr) : Point values can be misleading for g(x) when the range varies

In case you're interested, and just to pass the info on to Jason and Rob because we chatted about this yesterday, I built 3 comparative ddf models for bowheads and belugas (each species separately):

Scenario 1. Omit sightings collected when the visibility perpendicular to the transect was < 1.5 km and build the ddf without specifying values for int.range, so w was constant.

Scenario 2. Keep all sightings, regardless of perpendicular visibility; build the ddf using int.range to allow w to vary across observations.

Scenario 3. Keep all sightings, regardless of perpendicular visibility; build the ddf without specifying values for int.range, so w was constant.

The results were the same for both species (see attached). Scenario 2 (all sightings, variable int.range) produced the smallest abundance estimates and the largest ESWs. Scenario 1 (limit sightings by perpendicular visibility filter, assume constant w) resulted in intermediate abundance and ESW estimates. Scenario 3 (all sightings, assume constant w) resulted in the largest abundance estimates and the smallest ESWs. If I'm thinking about this correctly, those results are exactly what we should expect. Scenarios 1 and 3 "think" they're missing sightings farther out, but they really just need to be corrected for how far the observers can see; they produce smaller ESWs, which inflate Nhat.

Does this make sense to you?

If it's possible at some point to fix the plotting function, that would be fabulous! I understand that you have a lot of other stuff you're working on, so this isn't totally critical.

Thanks for your help!

Megan

On Thu, May 11, 2017 at 7:14 AM, DL Miller notifications@github.com wrote:

Megan, can you try out using the fix in the intrange-fix branch. You can install that version using the following code:

library(devtools) install_github("DistanceDevelopment/mrds", ref="intrange-fix")

Not 100% sure that the results will be reasonable, so please let me know if they are not!

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/DistanceDevelopment/mrds/issues/16#issuecomment-300802475, or mute the thread https://github.com/notifications/unsubscribe-auth/AafxNS1JqGRN5Rte-PmOh-V1WN5xZdfeks5r4xfcgaJpZM4NUggZ .

################

BOWHEAD WHALES

################

Model limited to sightings made when visibility >= 1.5 km.

Assumed constant w for all sightings.

              summary(Bm.trnc5pct.hr.catsize)

Summary for ds object Number of observations : 853 Distance range : 0 - 2.683248 AIC : 1070.684

Detection function: Hazard-rate key function

Detection function parameters Scale coefficient(s):
estimate se (Intercept) -0.5727164 0.0855839 catsize2 0.4711711 0.1159855

Shape coefficient(s):
estimate se (Intercept) 0.628276 0.07368229

                    Estimate           SE         CV

Average p 0.3693237 0.01721304 0.04660691 N in covered region 2309.6269281 124.88828494 0.05407293

              x <- summary(Bm.trnc5pct.hr.catsize)
              x$average.p * x$width

[1] 0.9909872

Model that included all sightings, regardless of how far the

observers could see. The int.range matrix was used to

specify w individually for each observation.

              summary(Bmi.dx.trnc5pct.hr.catsize)

Summary for ds object Number of observations : 872 Distance range : 0 - 2.645691 AIC : 3422.104

Detection function: Hazard-rate key function

Detection function parameters Scale coefficient(s):
estimate se (Intercept) -0.1970628 0.04986634 catsize2 0.3573042 0.06562772

Shape coefficient(s):
estimate se (Intercept) 1.174815 0.0649592

                    Estimate          SE         CV

Average p 0.4326661 0.01289582 0.02980548 N in covered region 2015.4108663 79.44205726 0.03941730

              x <- summary(Bmi.dx.trnc5pct.hr.catsize)
              x$average.p * x$width

[1] 1.144701

Model that included all sightings, regardless of how far the

observers could see. Assumed constant w for all sightings

              summary(Bm.dx.trnc5pct.hr.catsize)

Summary for ds object Number of observations : 872 Distance range : 0 - 2.645691 AIC : 1067.31

Detection function: Hazard-rate key function

Detection function parameters Scale coefficient(s):
estimate se (Intercept) -0.6442597 0.08874811 catsize2 0.4886460 0.12024751

Shape coefficient(s):
estimate se (Intercept) 0.5863391 0.07244761

                   Estimate           SE         CV

Average p 0.359157 0.01714758 0.04774395 N in covered region 2427.907363 133.57315492 0.05501575

              x <- summary(Bm.dx.trnc5pct.hr.catsize)
              x$average.p * x$width

[1] 0.9502186

#########

BELUGAS

#########

Model limited to sightings made when visibility >= 1.5 km.

Assumed constant w for all sightings.

summary(Dl.trnc5pct.hr.Long100.catIcePct)

Summary for ds object Number of observations : 1902 Distance range : 0 - 1.124083 AIC : -320.1117

Detection function: Hazard-rate key function

Detection function parameters Scale coefficient(s):
estimate se (Intercept) 2.8464299 0.67634984 Long100 -2.3733024 0.45271660 catIcePct -0.2951664 0.06950523

Shape coefficient(s):
estimate se (Intercept) 0.8089415 0.06959278

                    Estimate           SE         CV

Average p 0.5385192 0.01435677 0.02665972 N in covered region 3531.9078020 109.46839400 0.03099413

x <- summary(Dl.trnc5pct.hr.Long100.catIcePct) x$average.p * x$width # [1] 0.6053401

Model that included all sightings, regardless of how far the

observers could see. The int.range matrix was used to

specify w individually for each observation.

summary(Dli.dx.trnc5pct.hr.Long100.catIcePct)

Summary for ds object Number of observations : 1941 Distance range : 0 - 1.116542 AIC : -262.1092

Detection function: Hazard-rate key function

Detection function parameters Scale coefficient(s):
estimate se (Intercept) 0.8562704 0.15789544 Long100 -0.6171182 0.10525221 catIcePct -0.1674711 0.01465252

Shape coefficient(s):
estimate se (Intercept) 2.96231 0.08237556

                    Estimate           SE          CV

Average p 0.8221087 0.006646059 0.008084161 N in covered region 2361.0016941 29.897652944 0.012663122

x <- summary(Dli.dx.trnc5pct.hr.Long100.catIcePct) x$average.p * x$width # [1] 0.9179187

Model that included all sightings, regardless of how far the

observers could see. Assumed constant w for all sightings

summary(Dl.dx.trnc5pct.hr.Long100.catIcePct)

Summary for ds object Number of observations : 1941 Distance range : 0 - 1.116542 AIC : -347.4174

Detection function: Hazard-rate key function

Detection function parameters Scale coefficient(s):
estimate se (Intercept) 2.755224 0.67485959 Long100 -2.317189 0.45159758 catIcePct -0.294295 0.06919755

Shape coefficient(s):
estimate se (Intercept) 0.8044717 0.06902751

                    Estimate           SE         CV

Average p 0.5393988 0.01423387 0.02638840 N in covered region 3598.4508175 110.35430676 0.03066717

x <- summary(Dl.dx.trnc5pct.hr.Long100.catIcePct) x$average.p * x$width # [1] 0.6022613

megancatonferguson commented 7 years ago

Thanks for reporting back Megan! This is very useful info.

As far as I know, the plotting for this kind of thing is fiddly and I don't have time at the moment to make the modifications to plot.ds() (as really I should do a more serious re-write at the same time) but I'll note this for when I do get some time.

Your results make sense to me. This is a relief that the right thing is happening!

Thanks again for taking the time to report and test this and I'll try to get to the plotting issue soon.

On 19/05/2017 21:39, Megan Ferguson - NOAA Federal wrote:

Hi Dave,

The int.range fix worked fine and the results seem reasonable. The only glitch I ran into was that it wouldn't plot the resulting detection function...but I'm not sure that I should really expect it to be able to plot a detection function with variable integration parameters. Here's the plotting error that I got:

Error in int.range[selected, ] : (subscript) logical subscript too long In addition: Warning message: In plot.ds(Bmi.dx.trnc5pct.hr http://Bmi.dx.trnc5pct.hr) : Point values can be misleading for g(x) when the range varies

In case you're interested, and just to pass the info on to Jason and Rob because we chatted about this yesterday, I built 3 comparative ddf models for bowheads and belugas (each species separately):

Scenario 1. Omit sightings collected when the visibility perpendicular to the transect was < 1.5 km and build the ddf without specifying values for int.range, so w was constant.

Scenario 2. Keep all sightings, regardless of perpendicular visibility; build the ddf using int.range to allow w to vary across observations.

Scenario 3. Keep all sightings, regardless of perpendicular visibility; build the ddf without specifying values for int.range, so w was constant.

The results were the same for both species (see attached). Scenario 2 (all sightings, variable int.range) produced the smallest abundance estimates and the largest ESWs. Scenario 1 (limit sightings by perpendicular visibility filter, assume constant w) resulted in intermediate abundance and ESW estimates. Scenario 3 (all sightings, assume constant w) resulted in the largest abundance estimates and the smallest ESWs. If I'm thinking about this correctly, those results are exactly what we should expect. Scenarios 1 and 3 "think" they're missing sightings farther out, but they really just need to be corrected for how far the observers can see; they produce smaller ESWs, which inflate Nhat.

Does this make sense to you?

If it's possible at some point to fix the plotting function, that would be fabulous! I understand that you have a lot of other stuff you're working on, so this isn't totally critical.

Thanks for your help!

Megan

On Thu, May 11, 2017 at 7:14 AM, DL Miller <notifications@github.com mailto:notifications@github.com> wrote:
Megan, can you try out using the fix in the |intrange-fix| branch.
You can install that version using the following code:

library(devtools)
install_github("DistanceDevelopment/mrds", ref="intrange-fix")

Not 100% sure that the results will be reasonable, so please let me
know if they are not!

—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
<https://github.com/DistanceDevelopment/mrds/issues/16#issuecomment-300802475>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AafxNS1JqGRN5Rte-PmOh-V1WN5xZdfeks5r4xfcgaJpZM4NUggZ>.
-- Megan C. Ferguson Cetacean Assessment and Ecology Program Marine Mammal Laboratory Alaska Fisheries Science Center National Marine Fisheries Service NOAA 7600 Sand Point Way NE Seattle, WA 98115-6349 USA Tel: 1.206.526.6274 Fax: 1.206.526.6615 Megan.Ferguson@noaa.gov mailto:Megan.Ferguson@noaa.gov

dill commented 7 years ago

hm, somehow github thinks that the e-mail I just sent is from Megan, but nevermind...

megancatonferguson commented 7 years ago

Hi Dave,

I just got back to looking at this some more and I have some results that I can't explain. I'll use the same scenario numbers as my previous message:

Scenario 1. Omit sightings collected when the visibility perpendicular to the transect was < 1.5 km and build the ddf without specifying values for int.range, so w was constant.

Scenario 2. Keep all sightings, regardless of perpendicular visibility; build the ddf using int.range to allow w to vary across observations. w allowed to exceed the right-truncation distance.

Scenario 3. Keep all sightings, regardless of perpendicular visibility; build the ddf without specifying values for int.range, so w was constant.

Scenario 4 is new. Keep all sightings, regardless of perpendicular visibility; build the ddf using int.range to allow w to vary across observations. w <= right-truncation distance.

When I initially tested the new version of mrds with an int.range matrix that specified one row per observation (Scenario 2), I allowed w in int.range to equal the maximum perpendicular distance visible; in other words, w could be greater than the right-truncation distance. The "Distance range" in the ddf model summary stated that the max distance equaled the right-truncation distance; however, the ESW results for Scenario 2 made sense relative to Scenarios 1 & 3.

In all cases, I've been computing ESW as: ESW = average.p * width

I tried testing an int.range matrix for a new scenario (Scenario 4) in which w was set to the observation-specific visibility distance if it was less than the right-truncation distance, and the max allowable w equaled the right-truncation distance. The results for the null hazard-rate model under Scenario 4 made sense to me:

ESW for Scenario 1 = 0.985569 ESW for Scenario 2 = 1.105375 ESW for Scenario 3 = 0.9471885 ESW for Scenario 4 = 1.100923

However, when I added in covariates, the ESW for Scenario 4 was considerably less than that for analogous models built under the other three scenarios. Here are the ESWs for the models that include only a categorical vector for size as a covariate:

ESW for Scenario 1 = 0.9909872 ESW for Scenario 2 = 1.144701 ESW for Scenario 3 = 0.9502186 ESW for Scenario 4 = 0.7949639

Is there anything specific to the mcds portion of the algorithm that might cause a hiccup when specifying the int.range matrix to have observation-specific values for w? If you'd like me to submit this to GitHub so that it's part of the official comment log, I'm happy to do that.

On Sun, May 21, 2017 at 6:58 AM, David Lawrence Miller < dave@ninepointeightone.net> wrote:

Thanks for reporting back Megan! This is very useful info.

As far as I know, the plotting for this kind of thing is fiddly and I don't have time at the moment to make the modifications to plot.ds() (as really I should do a more serious re-write at the same time) but I'll note this for when I do get some time.

Your results make sense to me. This is a relief that the right thing is happening!

Thanks again for taking the time to report and test this and I'll try to get to the plotting issue soon.

On 19/05/2017 21:39, Megan Ferguson - NOAA Federal wrote:
Hi Dave,

The int.range fix worked fine and the results seem reasonable. The only glitch I ran into was that it wouldn't plot the resulting detection function...but I'm not sure that I should really expect it to be able to plot a detection function with variable integration parameters. Here's the plotting error that I got:

Error in int.range[selected, ] : (subscript) logical subscript too long In addition: Warning message: In plot.ds(Bmi.dx.trnc5pct.hr http://Bmi.dx.trnc5pct.hr) :

Point values can be misleading for g(x) when the range varies

In case you're interested, and just to pass the info on to Jason and Rob because we chatted about this yesterday, I built 3 comparative ddf models for bowheads and belugas (each species separately):

Scenario 1. Omit sightings collected when the visibility perpendicular to the transect was < 1.5 km and build the ddf without specifying values for int.range, so w was constant.

Scenario 2. Keep all sightings, regardless of perpendicular visibility; build the ddf using int.range to allow w to vary across observations.

Scenario 3. Keep all sightings, regardless of perpendicular visibility; build the ddf without specifying values for int.range, so w was constant.

The results were the same for both species (see attached). Scenario 2 (all sightings, variable int.range) produced the smallest abundance estimates and the largest ESWs. Scenario 1 (limit sightings by perpendicular visibility filter, assume constant w) resulted in intermediate abundance and ESW estimates. Scenario 3 (all sightings, assume constant w) resulted in the largest abundance estimates and the smallest ESWs. If I'm thinking about this correctly, those results are exactly what we should expect. Scenarios 1 and 3 "think" they're missing sightings farther out, but they really just need to be corrected for how far the observers can see; they produce smaller ESWs, which inflate Nhat.

Does this make sense to you?

If it's possible at some point to fix the plotting function, that would be fabulous! I understand that you have a lot of other stuff you're working on, so this isn't totally critical.

Thanks for your help!

Megan

On Thu, May 11, 2017 at 7:14 AM, DL Miller <notifications@github.com mailto:notifications@github.com> wrote:
Megan, can you try out using the fix in the |intrange-fix| branch.
You can install that version using the following code:

library(devtools)
install_github("DistanceDevelopment/mrds", ref="intrange-fix")

Not 100% sure that the results will be reasonable, so please let me
know if they are not!

—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
<https://github.com/DistanceDevelopment/mrds/issues/16#
issuecomment-300802475>, or mute the thread https://github.com/notifications/unsubscribe-auth/ AafxNS1JqGRN5Rte-PmOh-V1WN5xZdfeks5r4xfcgaJpZM4NUggZ.

-- Megan C. Ferguson Cetacean Assessment and Ecology Program Marine Mammal Laboratory Alaska Fisheries Science Center National Marine Fisheries Service NOAA 7600 Sand Point Way NE Seattle, WA 98115-6349 USA Tel: 1.206.526.6274 Fax: 1.206.526.6615 Megan.Ferguson@noaa.gov mailto:Megan.Ferguson@noaa.gov

megancatonferguson commented 7 years ago

I'm a bit unsure about what using the averaged probabilities of detection is masking here. The "average average p" is calculated by dividing the number of observed groups by the H-T estimate of abundance in the covered area (n/Nhat) so that could be masking all kinds of stuff.

What would be useful is probably looking at the prob. of detection for each group size for the different models. It would also be interesting to look at how many of each group size are included in each model, since we're effectively comparing different datasets with each result.

Hope this helps!

On 05/06/2017 20:36, Megan Ferguson - NOAA Federal wrote:

Hi Dave,

I just got back to looking at this some more and I have some results that I can't explain. I'll use the same scenario numbers as my previous message:

Scenario 1. Omit sightings collected when the visibility perpendicular to the transect was < 1.5 km and build the ddf without specifying values for int.range, so w was constant.

Scenario 2. Keep all sightings, regardless of perpendicular visibility; build the ddf using int.range to allow w to vary across observations. w allowed to exceed the right-truncation distance.

Scenario 3. Keep all sightings, regardless of perpendicular visibility; build the ddf without specifying values for int.range, so w was constant.

Scenario 4 is new. Keep all sightings, regardless of perpendicular visibility; build the ddf using int.range to allow w to vary across observations. w <= right-truncation distance.

When I initially tested the new version of mrds with an int.range matrix that specified one row per observation (Scenario 2), I allowed w in int.range to equal the maximum perpendicular distance visible; in other words, w could be greater than the right-truncation distance. The "Distance range" in the ddf model summary stated that the max distance equaled the right-truncation distance; however, the ESW results for Scenario 2 made sense relative to Scenarios 1 & 3.

In all cases, I've been computing ESW as: ESW = average.p * width

I tried testing an int.range matrix for a new scenario (Scenario 4) in which w was set to the observation-specific visibility distance if it was less than the right-truncation distance, and the max allowable w equaled the right-truncation distance. The results for the null hazard-rate model under Scenario 4 made sense to me:

ESW for Scenario 1 = 0.985569 ESW for Scenario 2 = 1.105375 ESW for Scenario 3 = 0.9471885 ESW for Scenario 4 = 1.100923

However, when I added in covariates, the ESW for Scenario 4 was considerably less than that for analogous models built under the other three scenarios. Here are the ESWs for the models that include only a categorical vector for size as a covariate:

ESW for Scenario 1 = 0.9909872 ESW for Scenario 2 = 1.144701 ESW for Scenario 3 = 0.9502186 ESW for Scenario 4 = 0.7949639

Is there anything specific to the mcds portion of the algorithm that might cause a hiccup when specifying the int.range matrix to have observation-specific values for w? If you'd like me to submit this to GitHub so that it's part of the official comment log, I'm happy to do that.

On Sun, May 21, 2017 at 6:58 AM, David Lawrence Miller <dave@ninepointeightone.net mailto:dave@ninepointeightone.net> wrote:
Thanks for reporting back Megan! This is very useful info.

As far as I know, the plotting for this kind of thing is fiddly and
I don't have time at the moment to make the modifications to
plot.ds() (as really I should do a more serious re-write at the same
time) but I'll note this for when I do get some time.

Your results make sense to me. This is a relief that the right thing
is happening!

Thanks again for taking the time to report and test this and I'll
try to get to the plotting issue soon.

On 19/05/2017 21:39, Megan Ferguson - NOAA Federal wrote:

    Hi Dave,

    The int.range fix worked fine and the results seem reasonable. 
    The only
    glitch I ran into was that it wouldn't plot the resulting detection
    function...but I'm not sure that I should really expect it to be
    able to
    plot a detection function with variable integration parameters. 
    Here's
    the plotting error that I got:

    Error in int.range[selected, ] : (subscript) logical subscript
    too long
    In addition: Warning message:
    In plot.ds(Bmi.dx.trnc5pct.hr <http://Bmi.dx.trnc5pct.hr>
    <http://Bmi.dx.trnc5pct.hr>) :

       Point values can be misleading for g(x) when the range varies

    In case you're interested, and just to pass the info on to Jason
    and Rob
    because we chatted about this yesterday, I built 3 comparative ddf
    models for bowheads and belugas (each species separately):

    Scenario 1. Omit sightings collected when the visibility
    perpendicular
    to the transect was < 1.5 km and build the ddf without
    specifying values
    for int.range, so w was constant.

    Scenario 2. Keep all sightings, regardless of perpendicular
    visibility;
    build the ddf using int.range to allow w to vary across
    observations.

    Scenario 3. Keep all sightings, regardless of perpendicular
    visibility;
    build the ddf without specifying values for int.range, so w was
    constant.

    The results were the same for both species (see attached). 
    Scenario 2
    (all sightings, variable int.range) produced the smallest abundance
    estimates and the largest ESWs.  Scenario 1 (limit sightings by
    perpendicular visibility filter, assume constant w) resulted in
    intermediate abundance and ESW estimates.  Scenario 3 (all
    sightings,
    assume constant w) resulted in the largest abundance estimates
    and the
    smallest ESWs.  If I'm thinking about this correctly, those
    results are
    exactly what we should expect.  Scenarios 1 and 3 "think" they're
    missing sightings farther out, but they really just need to be
    corrected
    for how far the observers can see; they produce smaller ESWs, which
    inflate Nhat.

    Does this make sense to you?

    If it's possible at some point to fix the plotting function,
    that would
    be fabulous!  I understand that you have a lot of other stuff you're
    working on, so this isn't totally critical.

    Thanks for your help!

    Megan

    On Thu, May 11, 2017 at 7:14 AM, DL Miller
    <notifications@github.com <mailto:notifications@github.com>
    <mailto:notifications@github.com
    <mailto:notifications@github.com>>> wrote:

         Megan, can you try out using the fix in the |intrange-fix|
    branch.
         You can install that version using the following code:

         library(devtools)
         install_github("DistanceDevelopment/mrds", ref="intrange-fix")

         Not 100% sure that the results will be reasonable, so
    please let me
         know if they are not!

         —
         You are receiving this because you authored the thread.
         Reply to this email directly, view it on GitHub

    <https://github.com/DistanceDevelopment/mrds/issues/16#issuecomment-300802475
    <https://github.com/DistanceDevelopment/mrds/issues/16#issuecomment-300802475>>,
         or mute the thread

    <https://github.com/notifications/unsubscribe-auth/AafxNS1JqGRN5Rte-PmOh-V1WN5xZdfeks5r4xfcgaJpZM4NUggZ
    <https://github.com/notifications/unsubscribe-auth/AafxNS1JqGRN5Rte-PmOh-V1WN5xZdfeks5r4xfcgaJpZM4NUggZ>>.

    --
    Megan C. Ferguson
    Cetacean Assessment and Ecology Program
    Marine Mammal Laboratory
    Alaska Fisheries Science Center
    National Marine Fisheries Service
    NOAA
    7600 Sand Point Way NE
    Seattle, WA 98115-6349
    USA
    Tel: 1.206.526.6274 <tel:1.206.526.6274>
    Fax: 1.206.526.6615 <tel:1.206.526.6615>
    Megan.Ferguson@noaa.gov <mailto:Megan.Ferguson@noaa.gov>
    <mailto:Megan.Ferguson@noaa.gov <mailto:Megan.Ferguson@noaa.gov>>
-- Megan C. Ferguson Cetacean Assessment and Ecology Program Marine Mammal Laboratory Alaska Fisheries Science Center National Marine Fisheries Service NOAA 7600 Sand Point Way NE Seattle, WA 98115-6349 USA Tel: 1.206.526.6274 Fax: 1.206.526.6615 Megan.Ferguson@noaa.gov mailto:Megan.Ferguson@noaa.gov

megancatonferguson commented 7 years ago

Thanks, Dave. I think your leads will prove fruitful. I think it's just due to confusion on my part about what average.p and the predict() functions give me by default. I think I can figure this out on my own now. I won't waste your time here, but I will report back once I have something substantive to say!

On Thu, Jun 8, 2017 at 5:23 AM, David Lawrence Miller < dave@ninepointeightone.net> wrote:

I'm a bit unsure about what using the averaged probabilities of detection is masking here. The "average average p" is calculated by dividing the number of observed groups by the H-T estimate of abundance in the covered area (n/Nhat) so that could be masking all kinds of stuff.

What would be useful is probably looking at the prob. of detection for each group size for the different models. It would also be interesting to look at how many of each group size are included in each model, since we're effectively comparing different datasets with each result.

Hope this helps!

On 05/06/2017 20:36, Megan Ferguson - NOAA Federal wrote:
Hi Dave,

I just got back to looking at this some more and I have some results that I can't explain. I'll use the same scenario numbers as my previous message:

Scenario 1. Omit sightings collected when the visibility perpendicular to the transect was < 1.5 km and build the ddf without specifying values for int.range, so w was constant.

Scenario 2. Keep all sightings, regardless of perpendicular visibility; build the ddf using int.range to allow w to vary across observations. w allowed to exceed the right-truncation distance.

Scenario 3. Keep all sightings, regardless of perpendicular visibility; build the ddf without specifying values for int.range, so w was constant.

Scenario 4 is new. Keep all sightings, regardless of perpendicular visibility; build the ddf using int.range to allow w to vary across observations. w <= right-truncation distance.

When I initially tested the new version of mrds with an int.range matrix that specified one row per observation (Scenario 2), I allowed w in int.range to equal the maximum perpendicular distance visible; in other words, w could be greater than the right-truncation distance. The "Distance range" in the ddf model summary stated that the max distance equaled the right-truncation distance; however, the ESW results for Scenario 2 made sense relative to Scenarios 1 & 3.

In all cases, I've been computing ESW as: ESW = average.p * width

I tried testing an int.range matrix for a new scenario (Scenario 4) in which w was set to the observation-specific visibility distance if it was less than the right-truncation distance, and the max allowable w equaled the right-truncation distance. The results for the null hazard-rate model under Scenario 4 made sense to me:

ESW for Scenario 1 = 0.985569 ESW for Scenario 2 = 1.105375 ESW for Scenario 3 = 0.9471885 ESW for Scenario 4 = 1.100923

However, when I added in covariates, the ESW for Scenario 4 was considerably less than that for analogous models built under the other three scenarios. Here are the ESWs for the models that include only a categorical vector for size as a covariate:

ESW for Scenario 1 = 0.9909872 ESW for Scenario 2 = 1.144701 ESW for Scenario 3 = 0.9502186 ESW for Scenario 4 = 0.7949639

Is there anything specific to the mcds portion of the algorithm that might cause a hiccup when specifying the int.range matrix to have observation-specific values for w? If you'd like me to submit this to GitHub so that it's part of the official comment log, I'm happy to do that.

On Sun, May 21, 2017 at 6:58 AM, David Lawrence Miller < dave@ninepointeightone.net mailto:dave@ninepointeightone.net> wrote:
Thanks for reporting back Megan! This is very useful info.

As far as I know, the plotting for this kind of thing is fiddly and
I don't have time at the moment to make the modifications to
plot.ds() (as really I should do a more serious re-write at the same
time) but I'll note this for when I do get some time.

Your results make sense to me. This is a relief that the right thing
is happening!

Thanks again for taking the time to report and test this and I'll
try to get to the plotting issue soon.

On 19/05/2017 21:39, Megan Ferguson - NOAA Federal wrote:

    Hi Dave,

    The int.range fix worked fine and the results seem reasonable.
 The only
    glitch I ran into was that it wouldn't plot the resulting
detection function...but I'm not sure that I should really expect it to be able to plot a detection function with variable integration parameters. Here's the plotting error that I got:
    Error in int.range[selected, ] : (subscript) logical subscript
    too long
    In addition: Warning message:
    In plot.ds(Bmi.dx.trnc5pct.hr <http://Bmi.dx.trnc5pct.hr>
    <http://Bmi.dx.trnc5pct.hr>) :

       Point values can be misleading for g(x) when the range varies

    In case you're interested, and just to pass the info on to Jason
    and Rob
    because we chatted about this yesterday, I built 3 comparative ddf
    models for bowheads and belugas (each species separately):

    Scenario 1. Omit sightings collected when the visibility
    perpendicular
    to the transect was < 1.5 km and build the ddf without
    specifying values
    for int.range, so w was constant.

    Scenario 2. Keep all sightings, regardless of perpendicular
    visibility;
    build the ddf using int.range to allow w to vary across
    observations.

    Scenario 3. Keep all sightings, regardless of perpendicular
    visibility;
    build the ddf without specifying values for int.range, so w was
    constant.

    The results were the same for both species (see attached).
Scenario 2 (all sightings, variable int.range) produced the smallest abundance estimates and the largest ESWs. Scenario 1 (limit sightings by perpendicular visibility filter, assume constant w) resulted in intermediate abundance and ESW estimates. Scenario 3 (all sightings, assume constant w) resulted in the largest abundance estimates and the smallest ESWs. If I'm thinking about this correctly, those results are exactly what we should expect. Scenarios 1 and 3 "think" they're missing sightings farther out, but they really just need to be corrected for how far the observers can see; they produce smaller ESWs, which inflate Nhat.
    Does this make sense to you?

    If it's possible at some point to fix the plotting function,
    that would
    be fabulous!  I understand that you have a lot of other stuff
you're working on, so this isn't totally critical.
    Thanks for your help!

    Megan

    On Thu, May 11, 2017 at 7:14 AM, DL Miller
    <notifications@github.com <mailto:notifications@github.com>
    <mailto:notifications@github.com

    <mailto:notifications@github.com>>> wrote:

         Megan, can you try out using the fix in the |intrange-fix|
    branch.
         You can install that version using the following code:

         library(devtools)
         install_github("DistanceDevelopment/mrds",
ref="intrange-fix")
         Not 100% sure that the results will be reasonable, so
    please let me
         know if they are not!

         —
         You are receiving this because you authored the thread.
         Reply to this email directly, view it on GitHub
                <https://github.com/DistanceDe
velopment/mrds/issues/16#issuecomment-300802475 https://github.com/DistanceDevelopment/mrds/issues/16# issuecomment-300802475>, or mute the thread https://github.com/notifications/unsubscribe-auth/ AafxNS1JqGRN5Rte-PmOh-V1WN5xZdfeks5r4xfcgaJpZM4NUggZ <https://github.com/notifications/unsubscribe-auth/ AafxNS1JqGRN5Rte-PmOh-V1WN5xZdfeks5r4xfcgaJpZM4NUggZ>.
    --
    Megan C. Ferguson
    Cetacean Assessment and Ecology Program
    Marine Mammal Laboratory
    Alaska Fisheries Science Center
    National Marine Fisheries Service
    NOAA
    7600 Sand Point Way NE
    Seattle, WA 98115-6349
    USA
    Tel: 1.206.526.6274 <tel:1.206.526.6274>
    Fax: 1.206.526.6615 <tel:1.206.526.6615>
    Megan.Ferguson@noaa.gov <mailto:Megan.Ferguson@noaa.gov>
    <mailto:Megan.Ferguson@noaa.gov <mailto:Megan.Ferguson@noaa.gov>>
-- Megan C. Ferguson Cetacean Assessment and Ecology Program Marine Mammal Laboratory Alaska Fisheries Science Center National Marine Fisheries Service NOAA 7600 Sand Point Way NE Seattle, WA 98115-6349 USA Tel: 1.206.526.6274 Fax: 1.206.526.6615 Megan.Ferguson@noaa.gov mailto:Megan.Ferguson@noaa.gov

megancatonferguson commented 7 years ago

Mystery solved.

This section from the predict.ddf helpfile was the key:

For line transects, the effective strip half-width (esw=TRUE) is the integral of the fitted detection function over either 0 to W or the specified int.range. The predicted detection probability is the average probability which is simply the integral divided by the distance range.

When int.range varies by observation, computing esw as average.p * width is the same as using predict.ddf with the default settings. In other words, this gives me the estimated esw based on the actual observed visibility range, which varied by observation. If I want to fit the model using int.range to account for variable visibility range, but then get an estimate of esw assuming that the hypothetical visibility range were at least equal to the max right-truncation distance, I need to use predict.ddf as follows:

predict(ddf.object, compute=TRUE, int.range=constant.int.range.mtx)

where constant.int.range.mtx is an (nobs x 2) matrix, where nobs is the number of observations used to fit ddf.object, and the matrix has a constant value for width. When I set the width in constant.int.range.mtx to the nominal right-truncation distance, I get reasonable results for p and esw.

Thanks for your help, Dave!

On Fri, Jun 9, 2017 at 10:27 AM, Megan Ferguson - NOAA Federal < megan.ferguson@noaa.gov> wrote:

Thanks, Dave. I think your leads will prove fruitful. I think it's just due to confusion on my part about what average.p and the predict() functions give me by default. I think I can figure this out on my own now. I won't waste your time here, but I will report back once I have something substantive to say!

On Thu, Jun 8, 2017 at 5:23 AM, David Lawrence Miller < dave@ninepointeightone.net> wrote:
I'm a bit unsure about what using the averaged probabilities of detection is masking here. The "average average p" is calculated by dividing the number of observed groups by the H-T estimate of abundance in the covered area (n/Nhat) so that could be masking all kinds of stuff.

What would be useful is probably looking at the prob. of detection for each group size for the different models. It would also be interesting to look at how many of each group size are included in each model, since we're effectively comparing different datasets with each result.

Hope this helps!

On 05/06/2017 20:36, Megan Ferguson - NOAA Federal wrote:
Hi Dave,

I just got back to looking at this some more and I have some results that I can't explain. I'll use the same scenario numbers as my previous message:

Scenario 1. Omit sightings collected when the visibility perpendicular to the transect was < 1.5 km and build the ddf without specifying values for int.range, so w was constant.

Scenario 2. Keep all sightings, regardless of perpendicular visibility; build the ddf using int.range to allow w to vary across observations. w allowed to exceed the right-truncation distance.

Scenario 3. Keep all sightings, regardless of perpendicular visibility; build the ddf without specifying values for int.range, so w was constant.

Scenario 4 is new. Keep all sightings, regardless of perpendicular visibility; build the ddf using int.range to allow w to vary across observations. w <= right-truncation distance.

When I initially tested the new version of mrds with an int.range matrix that specified one row per observation (Scenario 2), I allowed w in int.range to equal the maximum perpendicular distance visible; in other words, w could be greater than the right-truncation distance. The "Distance range" in the ddf model summary stated that the max distance equaled the right-truncation distance; however, the ESW results for Scenario 2 made sense relative to Scenarios 1 & 3.

In all cases, I've been computing ESW as: ESW = average.p * width

I tried testing an int.range matrix for a new scenario (Scenario 4) in which w was set to the observation-specific visibility distance if it was less than the right-truncation distance, and the max allowable w equaled the right-truncation distance. The results for the null hazard-rate model under Scenario 4 made sense to me:

ESW for Scenario 1 = 0.985569 ESW for Scenario 2 = 1.105375 ESW for Scenario 3 = 0.9471885 ESW for Scenario 4 = 1.100923

However, when I added in covariates, the ESW for Scenario 4 was considerably less than that for analogous models built under the other three scenarios. Here are the ESWs for the models that include only a categorical vector for size as a covariate:

ESW for Scenario 1 = 0.9909872 ESW for Scenario 2 = 1.144701 ESW for Scenario 3 = 0.9502186 ESW for Scenario 4 = 0.7949639

Is there anything specific to the mcds portion of the algorithm that might cause a hiccup when specifying the int.range matrix to have observation-specific values for w? If you'd like me to submit this to GitHub so that it's part of the official comment log, I'm happy to do that.

On Sun, May 21, 2017 at 6:58 AM, David Lawrence Miller < dave@ninepointeightone.net mailto:dave@ninepointeightone.net> wrote:
Thanks for reporting back Megan! This is very useful info.

As far as I know, the plotting for this kind of thing is fiddly and
I don't have time at the moment to make the modifications to
plot.ds() (as really I should do a more serious re-write at the same
time) but I'll note this for when I do get some time.

Your results make sense to me. This is a relief that the right thing
is happening!

Thanks again for taking the time to report and test this and I'll
try to get to the plotting issue soon.

On 19/05/2017 21:39, Megan Ferguson - NOAA Federal wrote:

    Hi Dave,

    The int.range fix worked fine and the results seem reasonable.
   The only
    glitch I ran into was that it wouldn't plot the resulting
detection function...but I'm not sure that I should really expect it to be able to plot a detection function with variable integration parameters. Here's the plotting error that I got:
    Error in int.range[selected, ] : (subscript) logical subscript
    too long
    In addition: Warning message:
    In plot.ds(Bmi.dx.trnc5pct.hr <http://Bmi.dx.trnc5pct.hr>
    <http://Bmi.dx.trnc5pct.hr>) :

       Point values can be misleading for g(x) when the range varies

    In case you're interested, and just to pass the info on to Jason
    and Rob
    because we chatted about this yesterday, I built 3 comparative
ddf models for bowheads and belugas (each species separately):
    Scenario 1. Omit sightings collected when the visibility
    perpendicular
    to the transect was < 1.5 km and build the ddf without
    specifying values
    for int.range, so w was constant.

    Scenario 2. Keep all sightings, regardless of perpendicular
    visibility;
    build the ddf using int.range to allow w to vary across
    observations.

    Scenario 3. Keep all sightings, regardless of perpendicular
    visibility;
    build the ddf without specifying values for int.range, so w was
    constant.

    The results were the same for both species (see attached).
Scenario 2 (all sightings, variable int.range) produced the smallest abundance estimates and the largest ESWs. Scenario 1 (limit sightings by perpendicular visibility filter, assume constant w) resulted in intermediate abundance and ESW estimates. Scenario 3 (all sightings, assume constant w) resulted in the largest abundance estimates and the smallest ESWs. If I'm thinking about this correctly, those results are exactly what we should expect. Scenarios 1 and 3 "think" they're missing sightings farther out, but they really just need to be corrected for how far the observers can see; they produce smaller ESWs, which inflate Nhat.
    Does this make sense to you?

    If it's possible at some point to fix the plotting function,
    that would
    be fabulous!  I understand that you have a lot of other stuff
you're working on, so this isn't totally critical.
    Thanks for your help!

    Megan

    On Thu, May 11, 2017 at 7:14 AM, DL Miller
    <notifications@github.com <mailto:notifications@github.com>
    <mailto:notifications@github.com

    <mailto:notifications@github.com>>> wrote:

         Megan, can you try out using the fix in the |intrange-fix|
    branch.
         You can install that version using the following code:

         library(devtools)
         install_github("DistanceDevelopment/mrds",
ref="intrange-fix")
         Not 100% sure that the results will be reasonable, so
    please let me
         know if they are not!

         —
         You are receiving this because you authored the thread.
         Reply to this email directly, view it on GitHub
                <https://github.com/DistanceDe
velopment/mrds/issues/16#issuecomment-300802475 https://github.com/DistanceDevelopment/mrds/issues/16#issue comment-300802475>, or mute the thread https://github.com/notificati ons/unsubscribe-auth/AafxNS1JqGRN5Rte-PmOh-V1WN5xZdfeks5r4xf cgaJpZM4NUggZ <https://github.com/notifications/unsubscribe-auth/AafxNS1Jq GRN5Rte-PmOh-V1WN5xZdfeks5r4xfcgaJpZM4NUggZ>.
    --
    Megan C. Ferguson
    Cetacean Assessment and Ecology Program
    Marine Mammal Laboratory
    Alaska Fisheries Science Center
    National Marine Fisheries Service
    NOAA
    7600 Sand Point Way NE
    Seattle, WA 98115-6349
    USA
    Tel: 1.206.526.6274 <tel:1.206.526.6274>
    Fax: 1.206.526.6615 <tel:1.206.526.6615>
    Megan.Ferguson@noaa.gov <mailto:Megan.Ferguson@noaa.gov>
    <mailto:Megan.Ferguson@noaa.gov <mailto:Megan.Ferguson@noaa.gov
-- Megan C. Ferguson Cetacean Assessment and Ecology Program Marine Mammal Laboratory Alaska Fisheries Science Center National Marine Fisheries Service NOAA 7600 Sand Point Way NE Seattle, WA 98115-6349 USA Tel: 1.206.526.6274 Fax: 1.206.526.6615 Megan.Ferguson@noaa.gov mailto:Megan.Ferguson@noaa.gov
-- Megan C. Ferguson Cetacean Assessment and Ecology Program Marine Mammal Laboratory Alaska Fisheries Science Center National Marine Fisheries Service NOAA 7600 Sand Point Way NE Seattle, WA 98115-6349 USA Tel: 1.206.526.6274 <(206)%20526-6274> Fax: 1.206.526.6615 <(206)%20526-6615> Megan.Ferguson@noaa.gov

dill commented 7 years ago

That's great, thanks Megan!

I'll close this issue and then we can direct folks here in the future if they have a similar issue.

At some point I need to reform how mrds does this and document it better, as it's clearly not very transparent!

On 10/06/2017 07:04, megancatonferguson wrote:

Mystery solved.

This section from the predict.ddf helpfile was the key:

For line transects, the effective strip half-width (esw=TRUE) is the integral of the fitted detection function over either 0 to W or the specified int.range. The predicted detection probability is the average probability which is simply the integral divided by the distance range.

When int.range varies by observation, computing esw as average.p * width is the same as using predict.ddf with the default settings. In other words, this gives me the estimated esw based on the actual observed visibility range, which varied by observation. If I want to fit the model using int.range to account for variable visibility range, but then get an estimate of esw assuming that the hypothetical visibility range were at least equal to the max right-truncation distance, I need to use predict.ddf as follows:

predict(ddf.object, compute=TRUE, int.range=constant.int.range.mtx)

where constant.int.range.mtx is an (nobs x 2) matrix, where nobs is the number of observations used to fit ddf.object, and the matrix has a constant value for width. When I set the width in constant.int.range.mtx to the nominal right-truncation distance, I get reasonable results for p and esw.

Thanks for your help, Dave!

On Fri, Jun 9, 2017 at 10:27 AM, Megan Ferguson - NOAA Federal < megan.ferguson@noaa.gov> wrote:

Thanks, Dave. I think your leads will prove fruitful. I think it's just due to confusion on my part about what average.p and the predict() functions give me by default. I think I can figure this out on my own now. I won't waste your time here, but I will report back once I have something substantive to say!

On Thu, Jun 8, 2017 at 5:23 AM, David Lawrence Miller < dave@ninepointeightone.net> wrote:

I'm a bit unsure about what using the averaged probabilities of detection is masking here. The "average average p" is calculated by dividing the number of observed groups by the H-T estimate of abundance in the covered area (n/Nhat) so that could be masking all kinds of stuff.

What would be useful is probably looking at the prob. of detection for each group size for the different models. It would also be interesting to look at how many of each group size are included in each model, since we're effectively comparing different datasets with each result.

Hope this helps!

On 05/06/2017 20:36, Megan Ferguson - NOAA Federal wrote:

Hi Dave,

I just got back to looking at this some more and I have some results that I can't explain. I'll use the same scenario numbers as my previous message:

Scenario 1. Omit sightings collected when the visibility perpendicular to the transect was < 1.5 km and build the ddf without specifying values for int.range, so w was constant.

Scenario 2. Keep all sightings, regardless of perpendicular visibility; build the ddf using int.range to allow w to vary across observations. w allowed to exceed the right-truncation distance.

Scenario 3. Keep all sightings, regardless of perpendicular visibility; build the ddf without specifying values for int.range, so w was constant.

Scenario 4 is new. Keep all sightings, regardless of perpendicular visibility; build the ddf using int.range to allow w to vary across observations. w <= right-truncation distance.

When I initially tested the new version of mrds with an int.range matrix that specified one row per observation (Scenario 2), I allowed w in int.range to equal the maximum perpendicular distance visible; in other words, w could be greater than the right-truncation distance. The "Distance range" in the ddf model summary stated that the max distance equaled the right-truncation distance; however, the ESW results for Scenario 2 made sense relative to Scenarios 1 & 3.

In all cases, I've been computing ESW as: ESW = average.p * width

I tried testing an int.range matrix for a new scenario (Scenario 4) in which w was set to the observation-specific visibility distance if it was less than the right-truncation distance, and the max allowable w equaled the right-truncation distance. The results for the null hazard-rate model under Scenario 4 made sense to me:

ESW for Scenario 1 = 0.985569 ESW for Scenario 2 = 1.105375 ESW for Scenario 3 = 0.9471885 ESW for Scenario 4 = 1.100923

However, when I added in covariates, the ESW for Scenario 4 was considerably less than that for analogous models built under the other three scenarios. Here are the ESWs for the models that include only a categorical vector for size as a covariate:

ESW for Scenario 1 = 0.9909872 ESW for Scenario 2 = 1.144701 ESW for Scenario 3 = 0.9502186 ESW for Scenario 4 = 0.7949639

Is there anything specific to the mcds portion of the algorithm that might cause a hiccup when specifying the int.range matrix to have observation-specific values for w? If you'd like me to submit this to GitHub so that it's part of the official comment log, I'm happy to do that.

On Sun, May 21, 2017 at 6:58 AM, David Lawrence Miller < dave@ninepointeightone.net mailto:dave@ninepointeightone.net> wrote:

Thanks for reporting back Megan! This is very useful info.

As far as I know, the plotting for this kind of thing is fiddly and I don't have time at the moment to make the modifications to plot.ds() (as really I should do a more serious re-write at the same time) but I'll note this for when I do get some time.

Your results make sense to me. This is a relief that the right thing is happening!

Thanks again for taking the time to report and test this and I'll try to get to the plotting issue soon.

On 19/05/2017 21:39, Megan Ferguson - NOAA Federal wrote:

Hi Dave,

The int.range fix worked fine and the results seem reasonable. The only glitch I ran into was that it wouldn't plot the resulting detection function...but I'm not sure that I should really expect it to be able to plot a detection function with variable integration parameters. Here's the plotting error that I got:

Error in int.range[selected, ] : (subscript) logical subscript too long In addition: Warning message: In plot.ds(Bmi.dx.trnc5pct.hr http://Bmi.dx.trnc5pct.hr http://Bmi.dx.trnc5pct.hr) :

Point values can be misleading for g(x) when the range varies

In case you're interested, and just to pass the info on to Jason and Rob because we chatted about this yesterday, I built 3 comparative ddf models for bowheads and belugas (each species separately):

Scenario 1. Omit sightings collected when the visibility perpendicular to the transect was < 1.5 km and build the ddf without specifying values for int.range, so w was constant.

Scenario 2. Keep all sightings, regardless of perpendicular visibility; build the ddf using int.range to allow w to vary across observations.

Scenario 3. Keep all sightings, regardless of perpendicular visibility; build the ddf without specifying values for int.range, so w was constant.

The results were the same for both species (see attached). Scenario 2 (all sightings, variable int.range) produced the smallest abundance estimates and the largest ESWs. Scenario 1 (limit sightings by perpendicular visibility filter, assume constant w) resulted in intermediate abundance and ESW estimates. Scenario 3 (all sightings, assume constant w) resulted in the largest abundance estimates and the smallest ESWs. If I'm thinking about this correctly, those results are exactly what we should expect. Scenarios 1 and 3 "think" they're missing sightings farther out, but they really just need to be corrected for how far the observers can see; they produce smaller ESWs, which inflate Nhat.

Does this make sense to you?

If it's possible at some point to fix the plotting function, that would be fabulous! I understand that you have a lot of other stuff you're working on, so this isn't totally critical.

Thanks for your help!

Megan

On Thu, May 11, 2017 at 7:14 AM, DL Miller <notifications@github.com mailto:notifications@github.com <mailto:notifications@github.com

mailto:notifications@github.com>> wrote:

Megan, can you try out using the fix in the |intrange-fix| branch. You can install that version using the following code:

library(devtools) install_github("DistanceDevelopment/mrds", ref="intrange-fix")

Not 100% sure that the results will be reasonable, so please let me know if they are not!

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/DistanceDe velopment/mrds/issues/16#issuecomment-300802475 <https://github.com/DistanceDevelopment/mrds/issues/16#issue comment-300802475>, or mute the thread https://github.com/notificati ons/unsubscribe-auth/AafxNS1JqGRN5Rte-PmOh-V1WN5xZdfeks5r4xf cgaJpZM4NUggZ <https://github.com/notifications/unsubscribe-auth/AafxNS1Jq GRN5Rte-PmOh-V1WN5xZdfeks5r4xfcgaJpZM4NUggZ>.

-- Megan C. Ferguson Cetacean Assessment and Ecology Program Marine Mammal Laboratory Alaska Fisheries Science Center National Marine Fisheries Service NOAA 7600 Sand Point Way NE Seattle, WA 98115-6349 USA Tel: 1.206.526.6274 Fax: 1.206.526.6615 Megan.Ferguson@noaa.gov mailto:Megan.Ferguson@noaa.gov <mailto:Megan.Ferguson@noaa.gov <mailto:Megan.Ferguson@noaa.gov

-- Megan C. Ferguson Cetacean Assessment and Ecology Program Marine Mammal Laboratory Alaska Fisheries Science Center National Marine Fisheries Service NOAA 7600 Sand Point Way NE Seattle, WA 98115-6349 USA Tel: 1.206.526.6274 Fax: 1.206.526.6615 Megan.Ferguson@noaa.gov mailto:Megan.Ferguson@noaa.gov

-- Megan C. Ferguson Cetacean Assessment and Ecology Program Marine Mammal Laboratory Alaska Fisheries Science Center National Marine Fisheries Service NOAA 7600 Sand Point Way NE Seattle, WA 98115-6349 USA Tel: 1.206.526.6274 <(206)%20526-6274> Fax: 1.206.526.6615 <(206)%20526-6615> Megan.Ferguson@noaa.gov

-- Megan C. Ferguson Cetacean Assessment and Ecology Program Marine Mammal Laboratory Alaska Fisheries Science Center National Marine Fisheries Service NOAA 7600 Sand Point Way NE Seattle, WA 98115-6349 USA Tel: 1.206.526.6274 Fax: 1.206.526.6615 Megan.Ferguson@noaa.gov

— You are receiving this because you modified the open/close state. Reply to this email directly, view it on GitHub https://github.com/DistanceDevelopment/mrds/issues/16#issuecomment-307499482, or mute the thread https://github.com/notifications/unsubscribe-auth/AAAHoUI1Iot8vNA82a_wLlCFsmcPGZDIks5sCbNjgaJpZM4NUggZ.

megancatonferguson commented 7 years ago

The solution was clear in the predict.ddf helpfile. The summary for the ddf object was a little misleading because it still listed the range as if it were constant, ending at the maximum sighting distance. It was a good learning experience for me!

On Sat, Jun 10, 2017 at 10:05 PM, DL Miller notifications@github.com wrote:

That's great, thanks Megan!

I'll close this issue and then we can direct folks here in the future if they have a similar issue.

At some point I need to reform how mrds does this and document it better, as it's clearly not very transparent!

On 10/06/2017 07:04, megancatonferguson wrote:

Mystery solved.

This section from the predict.ddf helpfile was the key:

For line transects, the effective strip half-width (esw=TRUE) is the integral of the fitted detection function over either 0 to W or the specified int.range. The predicted detection probability is the average probability which is simply the integral divided by the distance range.

When int.range varies by observation, computing esw as average.p * width is the same as using predict.ddf with the default settings. In other words, this gives me the estimated esw based on the actual observed visibility range, which varied by observation. If I want to fit the model using int.range to account for variable visibility range, but then get an estimate of esw assuming that the hypothetical visibility range were at least equal to the max right-truncation distance, I need to use predict.ddf as follows:

predict(ddf.object, compute=TRUE, int.range=constant.int.range.mtx)

where constant.int.range.mtx is an (nobs x 2) matrix, where nobs is the number of observations used to fit ddf.object, and the matrix has a constant value for width. When I set the width in constant.int.range.mtx to the nominal right-truncation distance, I get reasonable results for p and esw.

Thanks for your help, Dave!

On Fri, Jun 9, 2017 at 10:27 AM, Megan Ferguson - NOAA Federal < megan.ferguson@noaa.gov> wrote:

Thanks, Dave. I think your leads will prove fruitful. I think it's just due to confusion on my part about what average.p and the predict() functions give me by default. I think I can figure this out on my own now. I won't waste your time here, but I will report back once I have something substantive to say!

On Thu, Jun 8, 2017 at 5:23 AM, David Lawrence Miller < dave@ninepointeightone.net> wrote:

I'm a bit unsure about what using the averaged probabilities of detection is masking here. The "average average p" is calculated by dividing the number of observed groups by the H-T estimate of abundance in the covered area (n/Nhat) so that could be masking all kinds of stuff.

What would be useful is probably looking at the prob. of detection for each group size for the different models. It would also be interesting to look at how many of each group size are included in each model, since we're effectively comparing different datasets with each result.

Hope this helps!

On 05/06/2017 20:36, Megan Ferguson - NOAA Federal wrote:

Hi Dave,

I just got back to looking at this some more and I have some results that I can't explain. I'll use the same scenario numbers as my previous message:

Scenario 1. Omit sightings collected when the visibility perpendicular to the transect was < 1.5 km and build the ddf without specifying values for int.range, so w was constant.

Scenario 2. Keep all sightings, regardless of perpendicular visibility; build the ddf using int.range to allow w to vary across observations. w allowed to exceed the right-truncation distance.

Scenario 3. Keep all sightings, regardless of perpendicular visibility; build the ddf without specifying values for int.range, so w was constant.

Scenario 4 is new. Keep all sightings, regardless of perpendicular visibility; build the ddf using int.range to allow w to vary across observations. w <= right-truncation distance.

When I initially tested the new version of mrds with an int.range matrix that specified one row per observation (Scenario 2), I allowed w in int.range to equal the maximum perpendicular distance visible; in other words, w could be greater than the right-truncation distance. The "Distance range" in the ddf model summary stated that the max distance equaled the right-truncation distance; however, the ESW results for Scenario 2 made sense relative to Scenarios 1 & 3.

In all cases, I've been computing ESW as: ESW = average.p * width

I tried testing an int.range matrix for a new scenario (Scenario 4) in which w was set to the observation-specific visibility distance if it was less than the right-truncation distance, and the max allowable w equaled the right-truncation distance. The results for the null hazard-rate model under Scenario 4 made sense to me:

ESW for Scenario 1 = 0.985569 ESW for Scenario 2 = 1.105375 ESW for Scenario 3 = 0.9471885 ESW for Scenario 4 = 1.100923

However, when I added in covariates, the ESW for Scenario 4 was considerably less than that for analogous models built under the other three scenarios. Here are the ESWs for the models that include only a categorical vector for size as a covariate:

ESW for Scenario 1 = 0.9909872 ESW for Scenario 2 = 1.144701 ESW for Scenario 3 = 0.9502186 ESW for Scenario 4 = 0.7949639

Is there anything specific to the mcds portion of the algorithm that might cause a hiccup when specifying the int.range matrix to have observation-specific values for w? If you'd like me to submit this to GitHub so that it's part of the official comment log, I'm happy to do that.

On Sun, May 21, 2017 at 6:58 AM, David Lawrence Miller < dave@ninepointeightone.net mailto:dave@ninepointeightone.net> wrote:

Thanks for reporting back Megan! This is very useful info.

As far as I know, the plotting for this kind of thing is fiddly and I don't have time at the moment to make the modifications to plot.ds() (as really I should do a more serious re-write at the same time) but I'll note this for when I do get some time.

Your results make sense to me. This is a relief that the right thing is happening!

Thanks again for taking the time to report and test this and I'll try to get to the plotting issue soon.

On 19/05/2017 21:39, Megan Ferguson - NOAA Federal wrote:

Hi Dave,

The int.range fix worked fine and the results seem reasonable. The only glitch I ran into was that it wouldn't plot the resulting detection function...but I'm not sure that I should really expect it to be able to plot a detection function with variable integration parameters. Here's the plotting error that I got:

Error in int.range[selected, ] : (subscript) logical subscript too long In addition: Warning message: In plot.ds(Bmi.dx.trnc5pct.hr http://Bmi.dx.trnc5pct.hr http://Bmi.dx.trnc5pct.hr) :

Point values can be misleading for g(x) when the range varies

In case you're interested, and just to pass the info on to Jason and Rob because we chatted about this yesterday, I built 3 comparative ddf models for bowheads and belugas (each species separately):

Scenario 1. Omit sightings collected when the visibility perpendicular to the transect was < 1.5 km and build the ddf without specifying values for int.range, so w was constant.

Scenario 2. Keep all sightings, regardless of perpendicular visibility; build the ddf using int.range to allow w to vary across observations.

Scenario 3. Keep all sightings, regardless of perpendicular visibility; build the ddf without specifying values for int.range, so w was constant.

The results were the same for both species (see attached). Scenario 2 (all sightings, variable int.range) produced the smallest abundance estimates and the largest ESWs. Scenario 1 (limit sightings by perpendicular visibility filter, assume constant w) resulted in intermediate abundance and ESW estimates. Scenario 3 (all sightings, assume constant w) resulted in the largest abundance estimates and the smallest ESWs. If I'm thinking about this correctly, those results are exactly what we should expect. Scenarios 1 and 3 "think" they're missing sightings farther out, but they really just need to be corrected for how far the observers can see; they produce smaller ESWs, which inflate Nhat.

Does this make sense to you?

If it's possible at some point to fix the plotting function, that would be fabulous! I understand that you have a lot of other stuff you're working on, so this isn't totally critical.

Thanks for your help!

Megan

On Thu, May 11, 2017 at 7:14 AM, DL Miller <notifications@github.com mailto:notifications@github.com <mailto:notifications@github.com

mailto:notifications@github.com>> wrote:

Megan, can you try out using the fix in the |intrange-fix| branch. You can install that version using the following code:

library(devtools) install_github("DistanceDevelopment/mrds", ref="intrange-fix")

Not 100% sure that the results will be reasonable, so please let me know if they are not!

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/DistanceDe velopment/mrds/issues/16#issuecomment-300802475 <https://github.com/DistanceDevelopment/mrds/issues/16#issue comment-300802475>, or mute the thread https://github.com/notificati ons/unsubscribe-auth/AafxNS1JqGRN5Rte-PmOh-V1WN5xZdfeks5r4xf cgaJpZM4NUggZ <https://github.com/notifications/unsubscribe-auth/AafxNS1Jq GRN5Rte-PmOh-V1WN5xZdfeks5r4xfcgaJpZM4NUggZ>.

-- Megan C. Ferguson Cetacean Assessment and Ecology Program Marine Mammal Laboratory Alaska Fisheries Science Center National Marine Fisheries Service NOAA 7600 Sand Point Way NE Seattle, WA 98115-6349 USA Tel: 1.206.526.6274 <(206)%20526-6274> <tel:1.206.526.6274 <(206)%20526-6274>> Fax: 1.206.526.6615 <(206)%20526-6615> <tel:1.206.526.6615 <(206)%20526-6615>> Megan.Ferguson@noaa.gov mailto:Megan.Ferguson@noaa.gov <mailto:Megan.Ferguson@noaa.gov <mailto:Megan.Ferguson@noaa.gov

-- Megan C. Ferguson Cetacean Assessment and Ecology Program Marine Mammal Laboratory Alaska Fisheries Science Center National Marine Fisheries Service NOAA 7600 Sand Point Way NE Seattle, WA 98115-6349 USA Tel: 1.206.526.6274 <(206)%20526-6274> Fax: 1.206.526.6615 <(206)%20526-6615> Megan.Ferguson@noaa.gov mailto:Megan.Ferguson@noaa.gov

-- Megan C. Ferguson Cetacean Assessment and Ecology Program Marine Mammal Laboratory Alaska Fisheries Science Center National Marine Fisheries Service NOAA 7600 Sand Point Way NE Seattle, WA 98115-6349 USA Tel: 1.206.526.6274 <(206)%20526-6274> <(206)%20526-6274> Fax: 1.206.526.6615 <(206)%20526-6615> <(206)%20526-6615> Megan.Ferguson@noaa.gov

-- Megan C. Ferguson Cetacean Assessment and Ecology Program Marine Mammal Laboratory Alaska Fisheries Science Center National Marine Fisheries Service NOAA 7600 Sand Point Way NE Seattle, WA 98115-6349 USA Tel: 1.206.526.6274 <(206)%20526-6274> Fax: 1.206.526.6615 <(206)%20526-6615> Megan.Ferguson@noaa.gov

— You are receiving this because you modified the open/close state. Reply to this email directly, view it on GitHub https://github.com/DistanceDevelopment/mrds/issues/16#issuecomment- 307499482, or mute the thread https://github.com/notifications/unsubscribe-auth/AAAHoUI1Iot8vNA82a_ wLlCFsmcPGZDIks5sCbNjgaJpZM4NUggZ.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/DistanceDevelopment/mrds/issues/16#issuecomment-307607019, or mute the thread https://github.com/notifications/unsubscribe-auth/AafxNdeeGtQL8VcpD-89dhPympulB-N1ks5sC3WqgaJpZM4NUggZ .

dill commented 7 years ago

I've added an extra bit of output to remind the user of this in b3a6228 so hopefully that will be helpful.

On 13/06/2017 03:04, megancatonferguson wrote:

The solution was clear in the predict.ddf helpfile. The summary for the ddf object was a little misleading because it still listed the range as if it were constant, ending at the maximum sighting distance. It was a good learning experience for me!

On Sat, Jun 10, 2017 at 10:05 PM, DL Miller notifications@github.com wrote:

That's great, thanks Megan!

I'll close this issue and then we can direct folks here in the future if they have a similar issue.

At some point I need to reform how mrds does this and document it better, as it's clearly not very transparent!

On 10/06/2017 07:04, megancatonferguson wrote:

Mystery solved.

This section from the predict.ddf helpfile was the key:

For line transects, the effective strip half-width (esw=TRUE) is the integral of the fitted detection function over either 0 to W or the specified int.range. The predicted detection probability is the average probability which is simply the integral divided by the distance range.

When int.range varies by observation, computing esw as average.p * width is the same as using predict.ddf with the default settings. In other words, this gives me the estimated esw based on the actual observed visibility range, which varied by observation. If I want to fit the model using int.range to account for variable visibility range, but then get an estimate of esw assuming that the hypothetical visibility range were at least equal to the max right-truncation distance, I need to use predict.ddf as follows:

predict(ddf.object, compute=TRUE, int.range=constant.int.range.mtx)

where constant.int.range.mtx is an (nobs x 2) matrix, where nobs is the number of observations used to fit ddf.object, and the matrix has a constant value for width. When I set the width in constant.int.range.mtx to the nominal right-truncation distance, I get reasonable results for p and esw.

Thanks for your help, Dave!

On Fri, Jun 9, 2017 at 10:27 AM, Megan Ferguson - NOAA Federal < megan.ferguson@noaa.gov> wrote:

Thanks, Dave. I think your leads will prove fruitful. I think it's just due to confusion on my part about what average.p and the predict() functions give me by default. I think I can figure this out on my own now. I won't waste your time here, but I will report back once I have something substantive to say!

On Thu, Jun 8, 2017 at 5:23 AM, David Lawrence Miller < dave@ninepointeightone.net> wrote:

I'm a bit unsure about what using the averaged probabilities of detection is masking here. The "average average p" is calculated by dividing the number of observed groups by the H-T estimate of abundance in the covered area (n/Nhat) so that could be masking all kinds of stuff.

What would be useful is probably looking at the prob. of detection for each group size for the different models. It would also be interesting to look at how many of each group size are included in each model, since we're effectively comparing different datasets with each result.

Hope this helps!

On 05/06/2017 20:36, Megan Ferguson - NOAA Federal wrote:

Hi Dave,

I just got back to looking at this some more and I have some results that I can't explain. I'll use the same scenario numbers as my previous message:

Scenario 1. Omit sightings collected when the visibility perpendicular to the transect was < 1.5 km and build the ddf without specifying values for int.range, so w was constant.

Scenario 2. Keep all sightings, regardless of perpendicular visibility; build the ddf using int.range to allow w to vary across observations. w allowed to exceed the right-truncation distance.

Scenario 3. Keep all sightings, regardless of perpendicular visibility; build the ddf without specifying values for int.range, so w was constant.

Scenario 4 is new. Keep all sightings, regardless of perpendicular visibility; build the ddf using int.range to allow w to vary across observations. w <= right-truncation distance.

When I initially tested the new version of mrds with an int.range matrix that specified one row per observation (Scenario 2), I allowed w in int.range to equal the maximum perpendicular distance visible; in other words, w could be greater than the right-truncation distance. The "Distance range" in the ddf model summary stated that the max distance equaled the right-truncation distance; however, the ESW results for Scenario 2 made sense relative to Scenarios 1 & 3.

In all cases, I've been computing ESW as: ESW = average.p * width

I tried testing an int.range matrix for a new scenario (Scenario 4) in which w was set to the observation-specific visibility distance if it was less than the right-truncation distance, and the max allowable w equaled the right-truncation distance. The results for the null hazard-rate model under Scenario 4 made sense to me:

ESW for Scenario 1 = 0.985569 ESW for Scenario 2 = 1.105375 ESW for Scenario 3 = 0.9471885 ESW for Scenario 4 = 1.100923

However, when I added in covariates, the ESW for Scenario 4 was considerably less than that for analogous models built under the other three scenarios. Here are the ESWs for the models that include only a categorical vector for size as a covariate:

ESW for Scenario 1 = 0.9909872 ESW for Scenario 2 = 1.144701 ESW for Scenario 3 = 0.9502186 ESW for Scenario 4 = 0.7949639

Is there anything specific to the mcds portion of the algorithm that might cause a hiccup when specifying the int.range matrix to have observation-specific values for w? If you'd like me to submit this to GitHub so that it's part of the official comment log, I'm happy to do that.

On Sun, May 21, 2017 at 6:58 AM, David Lawrence Miller < dave@ninepointeightone.net mailto:dave@ninepointeightone.net> wrote:

Thanks for reporting back Megan! This is very useful info.

As far as I know, the plotting for this kind of thing is fiddly and I don't have time at the moment to make the modifications to plot.ds() (as really I should do a more serious re-write at the same time) but I'll note this for when I do get some time.

Your results make sense to me. This is a relief that the right thing is happening!

Thanks again for taking the time to report and test this and I'll try to get to the plotting issue soon.

On 19/05/2017 21:39, Megan Ferguson - NOAA Federal wrote:

Hi Dave,

The int.range fix worked fine and the results seem reasonable. The only glitch I ran into was that it wouldn't plot the resulting detection function...but I'm not sure that I should really expect it to be able to plot a detection function with variable integration parameters. Here's the plotting error that I got:

Error in int.range[selected, ] : (subscript) logical subscript too long In addition: Warning message: In plot.ds(Bmi.dx.trnc5pct.hr http://Bmi.dx.trnc5pct.hr http://Bmi.dx.trnc5pct.hr) :

Point values can be misleading for g(x) when the range varies

In case you're interested, and just to pass the info on to Jason and Rob because we chatted about this yesterday, I built 3 comparative ddf models for bowheads and belugas (each species separately):

Scenario 1. Omit sightings collected when the visibility perpendicular to the transect was < 1.5 km and build the ddf without specifying values for int.range, so w was constant.

Scenario 2. Keep all sightings, regardless of perpendicular visibility; build the ddf using int.range to allow w to vary across observations.

Scenario 3. Keep all sightings, regardless of perpendicular visibility; build the ddf without specifying values for int.range, so w was constant.

The results were the same for both species (see attached). Scenario 2 (all sightings, variable int.range) produced the smallest abundance estimates and the largest ESWs. Scenario 1 (limit sightings by perpendicular visibility filter, assume constant w) resulted in intermediate abundance and ESW estimates. Scenario 3 (all sightings, assume constant w) resulted in the largest abundance estimates and the smallest ESWs. If I'm thinking about this correctly, those results are exactly what we should expect. Scenarios 1 and 3 "think" they're missing sightings farther out, but they really just need to be corrected for how far the observers can see; they produce smaller ESWs, which inflate Nhat.

Does this make sense to you?

If it's possible at some point to fix the plotting function, that would be fabulous! I understand that you have a lot of other stuff you're working on, so this isn't totally critical.

Thanks for your help!

Megan

On Thu, May 11, 2017 at 7:14 AM, DL Miller <notifications@github.com mailto:notifications@github.com <mailto:notifications@github.com

mailto:notifications@github.com>> wrote:

Megan, can you try out using the fix in the |intrange-fix| branch. You can install that version using the following code:

library(devtools) install_github("DistanceDevelopment/mrds", ref="intrange-fix")

Not 100% sure that the results will be reasonable, so please let me know if they are not!

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/DistanceDe velopment/mrds/issues/16#issuecomment-300802475 <https://github.com/DistanceDevelopment/mrds/issues/16#issue comment-300802475>, or mute the thread https://github.com/notificati ons/unsubscribe-auth/AafxNS1JqGRN5Rte-PmOh-V1WN5xZdfeks5r4xf cgaJpZM4NUggZ <https://github.com/notifications/unsubscribe-auth/AafxNS1Jq GRN5Rte-PmOh-V1WN5xZdfeks5r4xfcgaJpZM4NUggZ>.

-- Megan C. Ferguson Cetacean Assessment and Ecology Program Marine Mammal Laboratory Alaska Fisheries Science Center National Marine Fisheries Service NOAA 7600 Sand Point Way NE Seattle, WA 98115-6349 USA Tel: 1.206.526.6274 <(206)%20526-6274> <tel:1.206.526.6274 <(206)%20526-6274>> Fax: 1.206.526.6615 <(206)%20526-6615> <tel:1.206.526.6615 <(206)%20526-6615>> Megan.Ferguson@noaa.gov mailto:Megan.Ferguson@noaa.gov <mailto:Megan.Ferguson@noaa.gov <mailto:Megan.Ferguson@noaa.gov

-- Megan C. Ferguson Cetacean Assessment and Ecology Program Marine Mammal Laboratory Alaska Fisheries Science Center National Marine Fisheries Service NOAA 7600 Sand Point Way NE Seattle, WA 98115-6349 USA Tel: 1.206.526.6274 <(206)%20526-6274> Fax: 1.206.526.6615 <(206)%20526-6615> Megan.Ferguson@noaa.gov mailto:Megan.Ferguson@noaa.gov

-- Megan C. Ferguson Cetacean Assessment and Ecology Program Marine Mammal Laboratory Alaska Fisheries Science Center National Marine Fisheries Service NOAA 7600 Sand Point Way NE Seattle, WA 98115-6349 USA Tel: 1.206.526.6274 <(206)%20526-6274> <(206)%20526-6274> Fax: 1.206.526.6615 <(206)%20526-6615> <(206)%20526-6615> Megan.Ferguson@noaa.gov

-- Megan C. Ferguson Cetacean Assessment and Ecology Program Marine Mammal Laboratory Alaska Fisheries Science Center National Marine Fisheries Service NOAA 7600 Sand Point Way NE Seattle, WA 98115-6349 USA Tel: 1.206.526.6274 <(206)%20526-6274> Fax: 1.206.526.6615 <(206)%20526-6615> Megan.Ferguson@noaa.gov

— You are receiving this because you modified the open/close state. Reply to this email directly, view it on GitHub https://github.com/DistanceDevelopment/mrds/issues/16#issuecomment- 307499482, or mute the thread https://github.com/notifications/unsubscribe-auth/AAAHoUI1Iot8vNA82a_ wLlCFsmcPGZDIks5sCbNjgaJpZM4NUggZ.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub

https://github.com/DistanceDevelopment/mrds/issues/16#issuecomment-307607019, or mute the thread

https://github.com/notifications/unsubscribe-auth/AafxNdeeGtQL8VcpD-89dhPympulB-N1ks5sC3WqgaJpZM4NUggZ .

-- Megan C. Ferguson Cetacean Assessment and Ecology Program Marine Mammal Laboratory Alaska Fisheries Science Center National Marine Fisheries Service NOAA 7600 Sand Point Way NE Seattle, WA 98115-6349 USA Tel: 1.206.526.6274 Fax: 1.206.526.6615 Megan.Ferguson@noaa.gov

— You are receiving this because you modified the open/close state. Reply to this email directly, view it on GitHub https://github.com/DistanceDevelopment/mrds/issues/16#issuecomment-307852645, or mute the thread https://github.com/notifications/unsubscribe-auth/AAAHofui6McNaO87pk1ZmG08ASlowXSYks5sDW-ogaJpZM4NUggZ.

megancatonferguson commented 7 years ago

I just thought I'd pass along two more points of clarity that I've recently had on this issue of interpreting predictions when int.range is a matrix:

I think that Section 6.7, "Distance sampling surveys when the observed area is incompletely covered," in Buckland et al. (2001) Intro to Distance Sampling is particularly relevant in explaining the math behind the problem. Eqn 6.42 was particularly helpful to me in understanding what is happening.
A practical relationship in the results from predict.ds() between the predicted esw and the predicted average probability for observation i is as follows:

esw[i]/p[i] = int.range[i,2]

where the first column of int.range is defined to be the left-truncation distance ("left"), and the second column is the right-truncation distance ("width"), as specified in the ddf helpfile. Testing that relationship on predictions from a ddf.ds object that I created using variable widths across observations confirmed that the predict.ds function is incorporating the variable widths into the esw and p computations rather than assuming a constant width.

On Sat, Jun 10, 2017 at 10:05 PM, DL Miller notifications@github.com wrote:

That's great, thanks Megan!

I'll close this issue and then we can direct folks here in the future if they have a similar issue.

At some point I need to reform how mrds does this and document it better, as it's clearly not very transparent!

On 10/06/2017 07:04, megancatonferguson wrote:

Mystery solved.

This section from the predict.ddf helpfile was the key:

For line transects, the effective strip half-width (esw=TRUE) is the integral of the fitted detection function over either 0 to W or the specified int.range. The predicted detection probability is the average probability which is simply the integral divided by the distance range.

When int.range varies by observation, computing esw as average.p * width is the same as using predict.ddf with the default settings. In other words, this gives me the estimated esw based on the actual observed visibility range, which varied by observation. If I want to fit the model using int.range to account for variable visibility range, but then get an estimate of esw assuming that the hypothetical visibility range were at least equal to the max right-truncation distance, I need to use predict.ddf as follows:

predict(ddf.object, compute=TRUE, int.range=constant.int.range.mtx)

where constant.int.range.mtx is an (nobs x 2) matrix, where nobs is the number of observations used to fit ddf.object, and the matrix has a constant value for width. When I set the width in constant.int.range.mtx to the nominal right-truncation distance, I get reasonable results for p and esw.

Thanks for your help, Dave!

On Fri, Jun 9, 2017 at 10:27 AM, Megan Ferguson - NOAA Federal < megan.ferguson@noaa.gov> wrote:

Thanks, Dave. I think your leads will prove fruitful. I think it's just due to confusion on my part about what average.p and the predict() functions give me by default. I think I can figure this out on my own now. I won't waste your time here, but I will report back once I have something substantive to say!

On Thu, Jun 8, 2017 at 5:23 AM, David Lawrence Miller < dave@ninepointeightone.net> wrote:

I'm a bit unsure about what using the averaged probabilities of detection is masking here. The "average average p" is calculated by dividing the number of observed groups by the H-T estimate of abundance in the covered area (n/Nhat) so that could be masking all kinds of stuff.

What would be useful is probably looking at the prob. of detection for each group size for the different models. It would also be interesting to look at how many of each group size are included in each model, since we're effectively comparing different datasets with each result.

Hope this helps!

On 05/06/2017 20:36, Megan Ferguson - NOAA Federal wrote:

Hi Dave,

I just got back to looking at this some more and I have some results that I can't explain. I'll use the same scenario numbers as my previous message:

Scenario 1. Omit sightings collected when the visibility perpendicular to the transect was < 1.5 km and build the ddf without specifying values for int.range, so w was constant.

Scenario 2. Keep all sightings, regardless of perpendicular visibility; build the ddf using int.range to allow w to vary across observations. w allowed to exceed the right-truncation distance.

Scenario 3. Keep all sightings, regardless of perpendicular visibility; build the ddf without specifying values for int.range, so w was constant.

Scenario 4 is new. Keep all sightings, regardless of perpendicular visibility; build the ddf using int.range to allow w to vary across observations. w <= right-truncation distance.

When I initially tested the new version of mrds with an int.range matrix that specified one row per observation (Scenario 2), I allowed w in int.range to equal the maximum perpendicular distance visible; in other words, w could be greater than the right-truncation distance. The "Distance range" in the ddf model summary stated that the max distance equaled the right-truncation distance; however, the ESW results for Scenario 2 made sense relative to Scenarios 1 & 3.

In all cases, I've been computing ESW as: ESW = average.p * width

I tried testing an int.range matrix for a new scenario (Scenario 4) in which w was set to the observation-specific visibility distance if it was less than the right-truncation distance, and the max allowable w equaled the right-truncation distance. The results for the null hazard-rate model under Scenario 4 made sense to me:

ESW for Scenario 1 = 0.985569 ESW for Scenario 2 = 1.105375 ESW for Scenario 3 = 0.9471885 ESW for Scenario 4 = 1.100923

However, when I added in covariates, the ESW for Scenario 4 was considerably less than that for analogous models built under the other three scenarios. Here are the ESWs for the models that include only a categorical vector for size as a covariate:

ESW for Scenario 1 = 0.9909872 ESW for Scenario 2 = 1.144701 ESW for Scenario 3 = 0.9502186 ESW for Scenario 4 = 0.7949639

Is there anything specific to the mcds portion of the algorithm that might cause a hiccup when specifying the int.range matrix to have observation-specific values for w? If you'd like me to submit this to GitHub so that it's part of the official comment log, I'm happy to do that.

On Sun, May 21, 2017 at 6:58 AM, David Lawrence Miller < dave@ninepointeightone.net mailto:dave@ninepointeightone.net> wrote:

Thanks for reporting back Megan! This is very useful info.

As far as I know, the plotting for this kind of thing is fiddly and I don't have time at the moment to make the modifications to plot.ds() (as really I should do a more serious re-write at the same time) but I'll note this for when I do get some time.

Your results make sense to me. This is a relief that the right thing is happening!

Thanks again for taking the time to report and test this and I'll try to get to the plotting issue soon.

On 19/05/2017 21:39, Megan Ferguson - NOAA Federal wrote:

Hi Dave,

The int.range fix worked fine and the results seem reasonable. The only glitch I ran into was that it wouldn't plot the resulting detection function...but I'm not sure that I should really expect it to be able to plot a detection function with variable integration parameters. Here's the plotting error that I got:

Error in int.range[selected, ] : (subscript) logical subscript too long In addition: Warning message: In plot.ds(Bmi.dx.trnc5pct.hr http://Bmi.dx.trnc5pct.hr http://Bmi.dx.trnc5pct.hr) :

Point values can be misleading for g(x) when the range varies

In case you're interested, and just to pass the info on to Jason and Rob because we chatted about this yesterday, I built 3 comparative ddf models for bowheads and belugas (each species separately):

Scenario 1. Omit sightings collected when the visibility perpendicular to the transect was < 1.5 km and build the ddf without specifying values for int.range, so w was constant.

Scenario 2. Keep all sightings, regardless of perpendicular visibility; build the ddf using int.range to allow w to vary across observations.

Scenario 3. Keep all sightings, regardless of perpendicular visibility; build the ddf without specifying values for int.range, so w was constant.

The results were the same for both species (see attached). Scenario 2 (all sightings, variable int.range) produced the smallest abundance estimates and the largest ESWs. Scenario 1 (limit sightings by perpendicular visibility filter, assume constant w) resulted in intermediate abundance and ESW estimates. Scenario 3 (all sightings, assume constant w) resulted in the largest abundance estimates and the smallest ESWs. If I'm thinking about this correctly, those results are exactly what we should expect. Scenarios 1 and 3 "think" they're missing sightings farther out, but they really just need to be corrected for how far the observers can see; they produce smaller ESWs, which inflate Nhat.

Does this make sense to you?

If it's possible at some point to fix the plotting function, that would be fabulous! I understand that you have a lot of other stuff you're working on, so this isn't totally critical.

Thanks for your help!

Megan

On Thu, May 11, 2017 at 7:14 AM, DL Miller <notifications@github.com mailto:notifications@github.com <mailto:notifications@github.com

mailto:notifications@github.com>> wrote:

Megan, can you try out using the fix in the |intrange-fix| branch. You can install that version using the following code:

library(devtools) install_github("DistanceDevelopment/mrds", ref="intrange-fix")

Not 100% sure that the results will be reasonable, so please let me know if they are not!

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/DistanceDe velopment/mrds/issues/16#issuecomment-300802475 <https://github.com/DistanceDevelopment/mrds/issues/16#issue comment-300802475>, or mute the thread https://github.com/notificati ons/unsubscribe-auth/AafxNS1JqGRN5Rte-PmOh-V1WN5xZdfeks5r4xf cgaJpZM4NUggZ <https://github.com/notifications/unsubscribe-auth/AafxNS1Jq GRN5Rte-PmOh-V1WN5xZdfeks5r4xfcgaJpZM4NUggZ>.

-- Megan C. Ferguson Cetacean Assessment and Ecology Program Marine Mammal Laboratory Alaska Fisheries Science Center National Marine Fisheries Service NOAA 7600 Sand Point Way NE Seattle, WA 98115-6349 USA Tel: 1.206.526.6274 <(206)%20526-6274> <tel:1.206.526.6274 <(206)%20526-6274>> Fax: 1.206.526.6615 <(206)%20526-6615> <tel:1.206.526.6615 <(206)%20526-6615>> Megan.Ferguson@noaa.gov mailto:Megan.Ferguson@noaa.gov <mailto:Megan.Ferguson@noaa.gov <mailto:Megan.Ferguson@noaa.gov

-- Megan C. Ferguson Cetacean Assessment and Ecology Program Marine Mammal Laboratory Alaska Fisheries Science Center National Marine Fisheries Service NOAA 7600 Sand Point Way NE Seattle, WA 98115-6349 USA Tel: 1.206.526.6274 <(206)%20526-6274> Fax: 1.206.526.6615 <(206)%20526-6615> Megan.Ferguson@noaa.gov mailto:Megan.Ferguson@noaa.gov

-- Megan C. Ferguson Cetacean Assessment and Ecology Program Marine Mammal Laboratory Alaska Fisheries Science Center National Marine Fisheries Service NOAA 7600 Sand Point Way NE Seattle, WA 98115-6349 USA Tel: 1.206.526.6274 <(206)%20526-6274> <(206)%20526-6274> Fax: 1.206.526.6615 <(206)%20526-6615> <(206)%20526-6615> Megan.Ferguson@noaa.gov

-- Megan C. Ferguson Cetacean Assessment and Ecology Program Marine Mammal Laboratory Alaska Fisheries Science Center National Marine Fisheries Service NOAA 7600 Sand Point Way NE Seattle, WA 98115-6349 USA Tel: 1.206.526.6274 <(206)%20526-6274> Fax: 1.206.526.6615 <(206)%20526-6615> Megan.Ferguson@noaa.gov

— You are receiving this because you modified the open/close state. Reply to this email directly, view it on GitHub https://github.com/DistanceDevelopment/mrds/issues/16#issuecomment- 307499482, or mute the thread https://github.com/notifications/unsubscribe-auth/AAAHoUI1Iot8vNA82a_ wLlCFsmcPGZDIks5sCbNjgaJpZM4NUggZ.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/DistanceDevelopment/mrds/issues/16#issuecomment-307607019, or mute the thread https://github.com/notifications/unsubscribe-auth/AafxNdeeGtQL8VcpD-89dhPympulB-N1ks5sC3WqgaJpZM4NUggZ .