SimonDedman / gbm.auto

Machine-learning Boosted Regression Tree software suite for species distribution modelling in R
https://doi.org/10.1371/journal.pone.0188955
Other
18 stars 6 forks source link

Gbm.interpolate #8

Open SimonDedman opened 6 years ago

SimonDedman commented 6 years ago

Area not included in input CPUE values nor output predictions! One current limitation is that the surveyed CPUE values input into the gbm.auto and gbm.valuemap functions are taken as the midpoints of the survey trawl. This converts the swept area (trawl net width times bottom trawled distance) into a single point value, rather than assigning the CPUE over the area. When gbm.auto predicts to points, logically those points should represent the same swept area as the average survey trawl, however they are currently a function of the resolution of the highest resolution dataset interpolated to, the 275x455m depth grids. The total CPUE calculated for the study area (i.e. the sum of all predicted CPUE points) could therefore be higher or lower than is logically defensible, depending on whether the areas of the predict-to grids are lower or higher (respectively) than the surveyed swept area. This has further implications for gbm.valuemap, since the candidate MPA sizes are a function of that total study area CPUE. Addressing this issue would require the user to include the (average) swept area for the response variable survey trawls, and predict-to areas corresponding to interpolated points, so predicted CPUE values are automatically scaled to the survey:predict-to area ratio. Maps in our study benefit from the uniform high-resolution grids that the predictions are made to, but this is not guaranteed to be the case for all studies. Incorporating an area ratio adjustment calculation would allow a more precise relationship to be inferred from variable length trawls and their CPUE values, and for the subsequent predictions to be made to variable area sites, such as Voronoi polygons. See Hans' email 6/10/15 Swept area should be the basis for the grid size to which all input data should be interpolated. Build an input data grid builder / interpolator. Can then calculate the spatial error (max @ max dist from interpolation point) for each input explanatory variable, combine them together, combine with CofV and RSB outputs, need to wrk out how to do so mathematically.

Am I reinventing the wheel here? Hopefully already exists.

SimonDedman commented 1 year ago

Only a problem if using the summed area of CPUE e.g. in valuemap. Else each cell/pixel has a value representing the expvars' values from its midpoint, which is fine (so long as the midpoints aren't extrapolated/Voronoi'd/etc to too large an area.

Can also convert the prediction csv values to be spatially equivalent by scaling them: samples coverage area / grids cell area.