DistanceDevelopment / Distance

Simple distance sampling analysis
GNU General Public License v3.0
9 stars 8 forks source link

Distance analysing binned data using arguments instead of distend / distbegin when distance is not in dataset #144

Open LHMarshall opened 1 year ago

LHMarshall commented 1 year ago

When both distbegin / distend area supplied in the dataset along with the arguments cutpoints and width in the function call you get a warning message saying that distbegin / distend are being ignored. In this case there was no column distance hence it was unclear what the detection function was being fitted to.

# There is no distance column
> dat[1,]
  Region.Label Area Sample.Label Species Effort distbegin distend size object
1       forest    1        FP123    YBBU      3        20      30    1      1

> # Don't know why this works as there is no column distance so shouldn't work
> x1<-ds(data = dat, transect = "point", formula=~1, key = "hn", 
+       adjustment = NULL, truncation =list(left=0,right=30), 
+       cutpoints = c(0,5,10,15,20,30), convert_units = conversion.factor)
data already has distend and distbegin columns, removing them and appling binning as specified by cutpoints.
Fitting half-normal key function
AIC= 167.049

Also results are inconsistent between the following 2 models when they should be identical

> # Try to achieve analysis with truncation of 30
> dat2 <- dat
> # Make a distance column
> dat2$distance <- (dat$distbegin+dat$distend)/2
> # Re-cut data as per bins
> x3<-ds(data = dat2, transect = "point", formula=~1, key = "hn", 
+        adjustment = NULL, truncation =list(left=0,right=30), 
+        cutpoints = c(0,5,10,15,20,30), convert_units = conversion.factor)
data already has distend and distbegin columns, removing them and appling binning as specified by cutpoints.
Fitting half-normal key function
AIC= 167.049
> plot(x3)
> # Now try same analysis with using distbegin / distend
> # Need to subset data
> dat3 <- dat[dat$distend <= 30,]
> View(dat3)
> x4<-ds(data = dat3, transect = "point", formula=~1, key = "hn", 
+        adjustment = NULL, convert_units = conversion.factor)
Columns "distbegin" and "distend" in data: performing a binned analysis...
Fitting half-normal key function
AIC= 159.573
> plot(x4)

x3 model plot: image

x4 model plot (not the strange additional point at distance 5) image

lenthomas commented 10 months ago

My suggestion is that we check input data and not allow users to have a distance column and distbegin + distend in the same data frame they pass in to ds or ddf (in mrds). Once we check for and eliminate this, we will solve a bunch of problems. This may also help solve issue #147.

LHMarshall commented 10 months ago

Checking the data for distance and distbegin and distend columns doesn't do anything to fix this scenario as there was no distance column in the data to start with. Early on in the ds function if there is no distance column it is created using the distend and distbegin columns. It was this column that was then used with the specified cutpoints to make new distbegin and distend columns in the data.

lenthomas commented 10 months ago

OK good point. My suggestion then is that we should not be adding a distance column to the dataset. No doubt it's being done so some other code works - but I think (without looking into the details) we're better to change that other code so that it's robust to not having a distance column. Having a fake distance column puts us in danger that it will be analyzed somewhere as an exact distance when it is not. I appreciate this will be more work and so puts this issue down the priority list. (Still think we should check for distance and distend/disbegin when users pass in data frames and not allow both, as well as the above.)

lenthomas commented 10 months ago

Just to note that Laura mentioned for this particular dataset, the distbegin and distend are related to the same underlying set of cutpoints for all observations. This is not a case where there are different distance intervals for each observation - although clearly our code needs to be robust to that. Given that all the bins are the same in this dataset, I don't know why the third code example produces a different result.

lenthomas commented 10 months ago

One short-term thing to do here is to add documentation under distbegin and distend to discourage users from using this when they have a fixed set of cutpoints that apply to the whole survey.

LHMarshall commented 9 months ago

I have updated the documentation but as the next step is a big fix I have moved this to the next release milestone.