DistanceDevelopment / Distance

Simple distance sampling analysis
GNU General Public License v3.0
9 stars 8 forks source link

`bootdht` and data binned in the field (containing `distbegin` and `distend`) #147

Closed erex closed 9 months ago

erex commented 1 year ago

A comment to the list (last in this thread) suggested problems with a bootstrap with data binned in the field. We don't have data of this type in our "tame" class of data shipped with Distance, so I manufactured one from the minke data set.

library(Distance)
data(minke)

# invent group size for each detection
# minke$size <- dsims:::rztpois(99, 2)

# convert exact distances into bins
vals <- seq(0,2,.2)
minke$bin <- cut(minke$distance, breaks=seq(0, 2, .2), right=FALSE, labels=FALSE)
minke$distbegin <- vals[minke$bin]
minke$distend <- vals[minke$bin+1]
# remove exact distances
minke$distance <- NULL

mod1 <- ds(minke)
bootout <- bootdht(mod1, flatfile=minke,  nboot=50)
bootout
data frame with 0 columns and 0 rows

Bootstrap runs instantaneously and produces no results. I think that implies bootstrapping cannot be done for data of this type. I classify as a bug.

One would think this bug would have come to the surface with CTDS where data are also binned, but checking vignette, seems data are recorded as exact distances (there is a column in the data labelled distance) and binned during analysis. @lenthomas

lenthomas commented 1 year ago

Yes, looks like a bug.

I wonder if using the cutpoints argument (what you refer to as binning during analysis) might be an effective workaround for the problems that both Vedika and Vaughn reported? distbeg and distend are really only necessary when you have cutpoints that vary per observation -- for example when you have data collected in bins from an airplane that is varying in altitude and so the perpendicular distance of the cutpoints varies with altitude. Did you @erex or @LHMarshall check this with either user? (I include you @LHMarshall because I see you responded to Vedika asking for a copy of the data)?

Meantime I'll classify as a bug and assign Laura as the new chief bug-swatter.

erex commented 1 year ago

Sent this to Vaughn yesterday, he has not yet reported back (other committments)

I've not tried this remedy, but I had a thought. Could you take the file of binned distances (with distbegin and distend fields), take the average (distend-distbegin)/2​ and create a new field called distance​ from that average. Then delete the distbegin​ and distend​ fields.

At the time of analysis (with ds​) use the cutpoints​ argument to re-establish the distance bins. That way, I'm guessing, bootdht​ will be satisfied because it finds a distance​ field in the data.

I recognise it is a bunch of messing about, and the bootdht​ function should be able to work with data binned in the field. But it might be a temporary work-around for the moment. Just a thought.

erex commented 1 year ago

Response from Vaughn to my suggested workaround:

Hey Eric,

Yes, the bootstrap workaround, done by creating a distance column and removing the distbegin and distend columns, works without error. Although, the model gives slightly different estimates with the created distances rather than the distance bins. 

erex commented 1 year ago

Further from Vaughn:

Thanks! Actually, I missed the part where you said to add the cutpoints with the created distance column. When I include the cutpoints the estimates are the same as using the distbegin and distend columns so that is an ideal workaround.  Will there be any future work to get bootdht to work with distbegin and distend columns?

To which I responded, it is on the todo list.