This issue is about spatial and spatiotemporal cross-validation for model performance evaluation, and is possibly related to Spatiotemporal-Exposures-and-Toxicology/Scalable_GIS#1 .
New observations ($\tilde{X}$) are assessed based on the lowest Euclidean distance:
$$d(\tilde{X}_{j \cdot}, X_{i \cdot})$$
from the existing training data (X) on the multivariate predictor space, say,
$$d_{j} = \arg \min_{k} d(\tilde{X}_{j\cdot}, X_{k\cdot})$$
Dissimilarity index is assessed based on the formula
$$\text{DI}_{j} = d_{j} / \bar{d}$$
where $\bar{d}$ is the average pairwise distance from the training data
With a user-defined threshold, the user can identify the new observations that are not suitable for applying the model (or gives low expectation on finding good prediction accuracy)
Questions
Does AOA unintentionally prefer the spatiotemporally closer features to others, especially when the high spatial and spatiotemporal autocorrelation exists?
What if one employs other distance metrics?
Quite surely it is computationally demanding O(n^2) as AOA relies on full distance matrix of the training data -> how to make it scalable?
Further question: how to assess spatial or spatiotemporal autocorrelation efficiently with the sizable datasets?
R packages
:mag:
CAST: spatiotemporal extension of caret. By Meyer, the author of AOA paper.
waywiser: spatial model evaluation in tidymodels ecosystem
targets: workflow management tool in R. Provides code examples for HPC utilization.
Spatial/temporal/spatiotemporal splitting functions for cross-validation are implemented in NRTAPmodel. I think these functions are also useful for this package either.
This issue is about spatial and spatiotemporal cross-validation for model performance evaluation, and is possibly related to Spatiotemporal-Exposures-and-Toxicology/Scalable_GIS#1 .
Paper
R packages
:mag:
CAST
: spatiotemporal extension ofcaret
. By Meyer, the author of AOA paper.waywiser
: spatial model evaluation intidymodels
ecosystemtargets
: workflow management tool in R. Provides code examples for HPC utilization.