DistanceDevelopment / Distance

Simple distance sampling analysis
GNU General Public License v3.0
9 stars 8 forks source link

May need to scale histogram of observed distances when there are unequal bins #110

Open lenthomas opened 3 years ago

lenthomas commented 3 years ago

Distance 1.0.4.9002, mrds 2.2.5.9000, R 4.1.1.

When the data are binned, and there are unequal bin widths, plotting the histogram without detection function/pdf (plot.ds using option which = 1) gives count frequency on the y-axis:

library(Distance)
data("wren_snapshot")
bin.cutpoints.100m <- bin.cutpoints <- c(0, 10, 20, 30, 40, 60, 80, 100)
conversion.factor <- convert_units("meter", NULL, "hectare")
wrensnap.hn.t100 <- ds(data=wren_snapshot, key="hn", adjustment=NULL, 
                       transect="point", cutpoints=bin.cutpoints.100m,
                       convert.units=conversion.factor)
plot(wrensnap.hn.t100, which = 1, pdf = TRUE)

image

However, it is not clear to me that this is the correct thing to do, as when we plot with the detection function superimposed, the y-axis is scaled to account for the bin width:

plot(wrensnap.hn.t100, which = 2, pdf = TRUE)

image

In this circumstance, base R gives a warning message:

hist(x, breaks = c(0, 10, 20, 30, 40, 60, 80, 100), freq = TRUE)

Gives

Warning message:
In plot.histogram(r, freq = freq1, col = col, border = border, angle = angle,  :
  the AREAS in the plot are wrong -- rather use 'freq = FALSE'

and if you use the default freq=FALSE then it plots the correct density. So, perhaps we should either issue a warning if people choose which = 1 and the bin widths aren't all the same, or we should change and plot density on the y-axis not count?

dill commented 3 years ago

I've thought previously we should just remove which=1 plotting. I've rarely seen it used in the wild.

There are a few reasons the bins look different between these two plots, one is probably down to this hist error you mention but there's also scaling to do the scaling between the area under the detection function and the area of the histogram (see the fairly heinous internals of plot.ds for details).

Including the warning seems fine, though I think people might read which=1 effectively as a bar chart and expect that bins with more observations to be taller (than they would be accounting for uneven bin size).