DistanceDevelopment / Distance

Simple distance sampling analysis
GNU General Public License v3.0
9 stars 8 forks source link

Stratum names which come after "Total" alphabetically cause NA's in bootstrap dht output #158

Closed erex closed 9 months ago

erex commented 1 year ago

Comment by Laura: the anomaly in the effort associated with transect PPWS24-2016 was due to the transect both being included in the data with observations and without as though it had been surveyed but nothing was seen. The NA values for the YEAR2016 strata turned out to be a separate issue to do with alphabetical ordering.

Burrowing down into a bootdht anomaly sparked by a user and reported in Distance issue #157 , I viewed the data frame generated in the bootstrap resample by bootdht_resample_data. Survey is stratified design, problem seems to arise for the first strata, but not remaining strata.

Here is a bit of the data frame, note the effort associated with transect PPWS24-2016:

image

For unknown reasons, the dataframe contains 143 records for transect PPWS24-2016 for 2 detections on that transect. An error of non-unique effort associated with this transect is trapped, and the estimate of abundance for the stratum (2016) is set to NA.

Hypothesis is that the true bug exists in the function bootdht_resample_data that draws the sample. Indeed, bootdht_resample_data creates a superabundance of transect PPWS24-2016, each with different lengths; which does not happen for other transects.

Dataset causing this problem belongs to user, can be provided upon request.

Milou reports success when taking numbers out of Region.Labels:

In addition to changing the Sample.Labels to unique values within each Region.Label, I also replaced the Region.Labels from YEAR2016, YEAR2018,... etc. to A,B,C,D in the original flatfile csv's This seemed to do the trick. I tried it on other combinations of data as well (other species and other PA/year combinations), and it all worked and seems to produce sensible results. For example, for BSD in PPWS:

  Dhat2016 Dhat2018 Dhat2020 Dhat2022
Mean from bootstrap 1.71 0.95 2.33 3.11
Median from bootstrap 1.69 0.94 2.31 3.09
Point Estimate from model 1.80 0.94 2.31 3.08
LHMarshall commented 1 year ago

@lenthomas @erex Ok I found the bug... it wasn't the numbers in the stratum names it was the start letter "Y" for YEAR comes after "T" for Total and at some point the Total value gets put above the stratum values and this is unexpected in line 65 of the bootit function (see bootdht_bootit.R)

image

As this user has found a workaround is to use stratum names that start with letters before T in the alphabet. Once the stratum names have been modified the bootstrap results are consistent with the model estimates from the initial model fit.

erex commented 1 year ago

Good that you found this ideosyncracy. Do we need to document that users cannot have stratum names with letters that follow "T" in the alphabet, or can the code be modified to prevent this problem?