DistanceDevelopment / mrds

R package for mark-recapture-distance-sampling analysis
GNU General Public License v3.0
4 stars 4 forks source link

Speed-up `sample_ddf()` (for use in K-S test) #35

Open erex opened 3 years ago

erex commented 3 years ago

Running a KS test on the amakihi data (yes 1100 detections) favoured model (OBs+MAS) running 250 bootstrap replicates took >10 hours (went to bed 10hrs after running the line of code). Few users are going to wait that long for the results of a goodness of fit test; could parallelisation help cut that execution time?

I was curious why there was such a discrepancy in GOF P-values between KS and CvM in DistWin,

Although the PDF plot shows no systematic departures between the fitted model and the data, the Kolmogorov-Smirnov goodness-of-fit statistic indicates a reasonably poor fit (Dn = 0.042, P = 0.02). This statistic is a function of the largest discrepancy between the observed and expected distances, and the poor result is likely caused by the rounding of observed distances in the data. The Cramér-von Mises goodness-of-fit statistic, which uses the overall departure between data and fitted model, shows no significant problems (W2 = 0.187, 0.2 < P ≤ 0.3). Marques et al. (2007)

wanted to see what the R packaage thought

> gof_ds(amak.hr.obs.mas)

Goodness of fit results for ddf object

Distance sampling Kolmogorov-Smirnov test
Test statistic = 0.0362515 p-value = 0.06
 (p-value calculated from 250/250 bootstraps)
Distance sampling Cramer-von Mises test (unweighted)
Test statistic = 0.150161 p-value = 0.389084
dill commented 2 years ago

Better to work out what the appropriate probability inverse transform is and use that to get extact samples. (Slowness comes from rejection sampling.)