Bias in simulations that shouldn't have bias

LHMarshall commented 2 years ago

The following simulation should not have yielded any bias and yet indicated ~20% bias in the abundance of clusters with coverage of the 95% confidence intervals only at 92%. The density surface for the simulations was flat, the average cluster size was the same across strata and cluster size was not set to affect detectability. All zigzags were generated inside minimum bounding rectangles and therefore gave close to uniform coverage.

Summary for Clusters

Summary Statistics

         mean.Cover.Area mean.Effort    mean.n no.zero.n    mean.k      mean.ER   mean.se.ER   sd.mean.ER
Burrum N       206687196    172239.3  5.235849         0 10.245283 3.039516e-05 1.241966e-05 1.018906e-05
Burrum S       182533569    152111.3  4.349057         1  9.377358 2.858745e-05 1.274966e-05 1.198714e-05
Elliot N       204901705    170751.4  4.886792         0 10.500000 2.860776e-05 1.260556e-05 1.070544e-05
Elliot S       194389676    161991.4  4.707547         1  9.783019 2.907037e-05 1.245351e-05 1.058874e-05
Urangan        181803726    151503.1  4.415094         1 10.679245 2.914996e-05 1.317969e-05 1.267864e-05
Total          970315873    808596.6 23.594340         0 50.584906 2.918001e-05 5.867051e-06 3.922425e-06

     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Estimates of Abundance (N)

         Truth mean.Estimate percent.bias  RMSE CI.coverage.prob mean.se sd.of.means
Burrum N    31         39.13        26.22 17.36             0.93   17.51       15.41
Burrum S    28         32.90        17.52 16.81             0.95   15.69       16.16
Elliot N    31         36.43        17.53 15.69             0.95   17.39       14.79
Elliot S    30         35.35        17.84 15.39             0.92   16.27       14.50
Urangan     30         35.60        18.65 17.43             0.95   17.29       16.59
Total      150        179.41        19.61 50.04             0.92   47.56       40.68

It is suspected that this bias was introduced at the stage where the data are stored from the dht object (perhaps in the store.dht.results function). A loop set up to repeatedly generate survey data from this same simulation object and then fit a model (using the analyse.data function - so the same as internally called) did not display this bias (see histogram and mean of the estimates below - truth was 150 clusters). The strata appear in a different order in the dot tables than in the shape file and while this was coded for this may be where the issue lies.

ests <- numeric(0)

for(i in 1:200){
  survey <- run.survey(sim.MBR)
  result <- analyse.data(sim.MBR@ds.analysis, survey)
  ests[i] <- result$model$dht$clusters$N$Estimate[6]
}

hist(ests)
abline(v=150, col = 2, lty = 2, lwd = 2)
> mean(ests)
[1] 150.1444

EstsHistoram

LHMarshall commented 2 years ago

@erex I wonder if this same issue might have been affecting the simulations you were running... will let you know when I get to the bottom of it.

LHMarshall commented 2 years ago

@erex @lenthomas failed to find a bug as such in my code... the only thing that I concluded that dsims was doing differently to the loop above was excluding datasets with <20 detections. When I lower the required number of detections to 5 in order that a detection function is fitted and the results are stored I get <3% bias (100 reps) in the cluster estimates. The estimates are still ~11% bias for individuals but it appears that expected cluster size is being over estimated... something else to look into, ah yes see #53 !

Summary for Clusters

Summary Statistics

         mean.Cover.Area mean.Effort mean.n no.zero.n mean.k      mean.ER   mean.se.ER   sd.mean.ER
Burrum N       205917319    171597.8   4.47         0  10.33 2.603428e-05 1.159519e-05 1.030851e-05
Burrum S       182600758    152167.3   3.67         0   9.37 2.413360e-05 1.165267e-05 1.133656e-05
Elliot N       204845585    170704.7   4.22         0  10.50 2.470011e-05 1.154519e-05 9.677831e-06
Elliot S       194671285    162226.1   4.03         1   9.98 2.485939e-05 1.151286e-05 1.220235e-05
Urangan        181003765    150836.5   3.70         2  10.51 2.453519e-05 1.162053e-05 1.189551e-05
Total          969038713    807532.3  20.09         0  50.69 2.488212e-05 5.366829e-06 5.071229e-06

     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Estimates of Abundance (N)

         Truth mean.Estimate percent.bias  RMSE CI.coverage.prob mean.se sd.of.means
Burrum N    31         33.88         9.28 15.30             0.98   16.56       15.10
Burrum S    28         27.88        -0.43 14.64             0.99   14.57       14.71
Elliot N    31         31.97         3.12 14.01             0.97   16.18       14.05
Elliot S    30         30.09         0.31 15.41             0.97   15.14       15.49
Urangan     30         30.25         0.83 15.65             0.96   15.45       15.73
Total      150        154.06         2.71 41.94             0.98   44.32       41.96

LHMarshall commented 2 years ago

@erex @lenthomas hmm how best to proceed? I propose that the default is 20 but the user can adapt the minimum number of detections. The warning that alerts the user to the problem will also now state that bias may be introduced. I will expand the help to explain that while we recommend 60-80 detections for detection function fitting and set the default minimum to be 20 that people wishing to reduce this threshold may do so using the minimum.n argument. And that not doing so when in fact many iterations are being excluded due to low numbers of detections will likely introduce bias.

erex commented 2 years ago

I don't know the details of how the number of detection condition is determined. Does the condition depend upon the model being fitted? In the situation above with strata, looks like the number of detections at the stratum level is ~4, but at the global level it reaches the threshold of 20.

Is a stratum-specific detection function model being used, the threshold should apply at the stratum level. Is the code clever enough to do this?

LHMarshall commented 2 years ago

@erex the number of detections is just that, the number of detected individuals or clusters.

The detection function is always a global detection function in dsims at present. The user if they wish could use Region.Label as a covariate. If there were very few or even no detections for a stratum this would cause an error if I suspect if Region.Label was a covariate which would be reported at the end of the simulation.

You say that a threshold should apply at the stratum level... on what basis? Would this only be if Region.Label was in the model? It appears that these somewhat artificial thresholds are likely to introduce bias in the results. Lowering the threshold removed most of the bias.

Perhaps the model should never be prevented from an attempt to fit it and instead any arising errors are captured and reported, perhaps low numbers of detections are also reported. Then when the summary of the simulation is constructed the users could choose which simulation repetitions to include / exclude?

erex commented 2 years ago

Yeah, given that dsims has the ability to create stratified designs, a user might be inclined to believe they can fit models with stratum-specific detection functions. I had mistakenly reached that conclusion; perhaps mention of this in the documentation to prevent others from replicating my mistake.

I'm curious what the mechanism might be that links minimum.n to biased estimates. I can't figure out how raising the threshold for sample size can erode the performance of an analysis. Can you elaborate?

LHMarshall commented 2 years ago

@erex my guess would be it's as simple as if we systematically exclude repetitions where we have lower numbers of detections we are excluding results which will have lower encounter rates hence our results end up positively biased.

To back that up here are a couple of histograms. The estimates of N are seen to have differing distributions when they are divided into 2 groups of less than 20 detections and greater than or equal to 20 detections. The estimates of Pa do not show that same pattern. If it is not coming from the model it must be coming from the encounter rate and that does intuitevly make sense.

HistogramEstimatesN HistogramEstimatesPa

erex commented 2 years ago

Thanks for the histos Laura and the thought about encounter rate; it makes sense. Your suggestion that the sample size threshold induces bias is based on the simulation shown at the top of this issue thread. In this scenario where total population size is 150, the threshold of 20 detections is 13% of the population. This means that a replicate that detections 10% of the population (15 detections) is excluded. Many surveys would be lucky to detection 10% of the population of interest.

Looking at it another way, if the true population size was 1500 (or 1000) rather than 150, I would be surprised if the minimum.n threshold of 20 detections would induce noticable bias.

It might be a suggestion (in the documentation) that if the user wishes to use minimum.n, perhaps that determination could be computed as a proportion of the population.

lenthomas commented 2 years ago

Ok, so my understanding is that you have a mechanism in dsims whereby multiple realizations from some model of animal density and detectability are simulated and where those realizations with a sample size of observed detections less than some specified minimum.n are excluded. I can quite see how this will generate a positive bias - it will tend to retain simulations with larger sample sizes and hence those that lead to larger density estimates. I also think it's not a wise option for people to choose in general, because in the real world they will only have one dataset to work on, and it may have fewer then minimum.n samples, and they don't have the option of simply generating another dataset! So, yes, I agree you should add a health warning to the option saying that it will tend to cause a positive bias; I also think your output should be clear about the number of iterations where the sample size was less than the minimum.n. Lastly, the option should be off by default IMO.

I don't feel strongly about how one might specify what the minimum.n is because I think it's very rare when users should be using it. I expect your programming effort is better spent on other things rather than making the way one specifies minimum.n more sophisticated!

Perhaps I'm missing something here though...

erex commented 2 years ago

@lenthomas: you misunderstood my comment. I did not suggest that coding be implemented to compute minimum.n; rather I suggested documentation describing the parameter minimum.n might offer guidance for setting the value.

LHMarshall commented 2 years ago

@lenthomas this was put in after a very short discussion at a distance development meeting many moons ago in the early development of DSsim. I believe it went along the lines of 'should I just allow DSsim to fit a model irrelevant of sample size or should we set a minimum number of detections?'... 'lets set a minimum'... '20 sounds good'. I'm happy to just remove it. The code is already pretty robust to fitting issues so it shouldn't be a big deal. I'll just create some tests for when we have very very few or even no detections.

LHMarshall commented 2 years ago

The reasoning for the minimum.n was just simulation stability... and more useful output.. but wasn't supposed to be at the expense of introducing bias! When there is only one detection the following (not very useful) error message is displayed by mrds:

> results <- analyse.data(sim@ds.analysis, survey)
Warning: simpleWarning in qt(ci.width, estimate.table$df): NaNs produced

I won't remove these warnings (as I am uncertain as to whether other situations could generate them) but I could add in warnings for when n is small... but how small? 20 was floated before but really we recommend 60 + detections...

@erex I don't think it matters what proportion of the population is detected but just how many detections there are. What might be more important is how complex the models are and how many parameters are being estimated. But perhaps the approach should be just to allow users to test away to the limits.

I feel it is a trade off which is only relevant to this fairly special case of how much pre checking do we want to do versus leave the users all the results to ponder themselves. But we do want them to know that any struggles may well be associated with small numbers of detections. At the moment the only thing that is accessible in the summary is the average number of detections. Users would have to delve into the results arrays themselves to find out a distribution of number of detections.

For 2 detections and a half normal model, the model is fitted quietly with no complaints most of the time (I just did a few reps and didn't see any messages - see plots below).

For 2 detections and a hazard rate... my computer is tied up for 24 seconds (for a single analysis) then I get the same message as with 1 detection and a half normal.

DistanceDevelopment / dsims

Bias in simulations that shouldn't have bias #52