add biased ppm methods - Githubissues

goldingn commented 8 years ago

add the capacity to fit biased PPMs by passing a bias raster to ppmify - probably as a raster passed via the area argument, with a bias = FALSE argument determining whether to sum across the cells of the raster, or across the cell areas

goldingn commented 8 years ago

also need to think about accounting for bias using a reference set of points - need to make sure that works mathematically

goldingn commented 8 years ago

I just changed the title of this, and edited the coments above, as I realised that I was using the wrong term here. Thinned PPMs are a subtype of marked PPMs, switching to the name 'bias' for this now...

goldingn commented 8 years ago

Having thought about the reference set background approach, we're good to use these for quadrature in the model. If we make an assumption that they are a random and independent sample from the bias distribution and assign each point i the same quadrature weight: where A is the area of the region of interest. This can then be viewed as a Monte Carlo approximation to the integral:

for:

- the intensity process for observed points, - its integral over the area A and where we know that the intensity process for observed points is a composite of two functions: and we know (or can sample from) the effort process: - (reporting e.g. survey time per unit area, or other units per area such as population density) and want to estimate: - the intensity of points per unit effort (e.g. per survey minute or per member of the population).

We can therefore treat the reference set background points as samples from and assign them equal weights.

Equally, in the bias raster case, we use quadrature points to sum across the effort metric for each quadrature point, and use these as the weights. I think 1/A are the right weights, but will think some more.

The real question is how to define a minimal interface for specifying a reference set of points.

goldingn commented 8 years ago

Thought about the weights for a reference set.

Since we only have samples from g(s) and don't know the integral E_A of g(s) over the area of interest A: we are representing g(s) as an unscaled density*. We therefore aren't able to estimate the expected total number of the 'dots' in the PPM after accounting for this bias (the same as not bing able to identify prevalence with a presence only SDM). We therefore have to make a judgement call about what to set E_A to for modelling.

A natural choice is to assume that E_A integrates to 1. That corresponds to weights A/m (with m reference-set samples). This is like using the Maxent 'raw' output, or relative occurrence rate, as a model of sampling intensity and feeding that in as a bias raster. If you use these weights and make this assumption, and fit a model to n observed points to estimate f(g(s)), then (from above) will be equal to n, as in an unbiased model.

Alternatively, we could set the model up so that is equal to one. The result would be that the predicted point pattern from the overall PPM would be the same as the Maxent raw output (at least in expectation). The corresponding quadrature weights in this case would be: A/ (m * n).

I would lean towards the first option of making E_A = 1 and =n

Note that using weights 1/A as I suggested above doesn't make any sense, but then I hadn't had coffee when I wrote that.

(note that we *do know the integral for g(s) in the case that we have a bias raster, we only don't when we have a sample of points)

goldingn commented 8 years ago

Suggested interface for ppmify, accounting for these thoughts:

ppmify(coords,
       density = 10,
       covariates = NULL,
       method = c('grid', 'count', 'points'),
       bias_grid = NULL,
       bias_points = NULL,
       area = NULL)

where coords, density, covariates and method are as before, bias_grid is an optional RasterLayer giving g(s) as above, bias_points is an optional set of reference-set coordinates (having the same format as coords) being points drawn from the PPM implied by g(s) (considered to be independent random draws from that distribution), used when methods = 'points' area is (as before) an optional Raster* or SpatialPolygons* object defining an area of interest smaller than the extent of coords or covariates (if used).

What do you think @fshearer?

One open question is what grid to use for the count method - do we use the resolution of the covariates and ignore the density argument, or do we use the density argument and have to centre-sample/resample the covariates?

goldingn / ppmify

add biased ppm methods #7