Envirometrix / landmap

Landmap package for R
46 stars 13 forks source link

Spatial CV #3

Open dirtdude opened 3 years ago

dirtdude commented 3 years ago

Hi. Working through the example now using my own data. https://gitlab.com/openlandmap/spatial-predictions-using-eml#using-geographical-distances-to-improve-spatial-interpolation

Just curious how the spatial CV is implemented. I'm getting poorer performance in CV versus using CaretEnsemble. This is probably due to the spatial CV, as my points are clustered. I'm getting ~ 0.32 R2 from LandMap, and ~0.5 R2 from caretEnsemble, using a linear combination of base learners. Basically I am wondering if you can tune the spatial CV, and where you can access the geographical distances. From the gitlab "This runs number of steps including derivation of geographical distances" what is under the hood here?

Thanks!

thengl commented 3 years ago

The spatial CV is implemented by spatial cross-validation:

  1. Estimate spatial autocorrelation range of spatial variation in the target variable (cell.size), if possible by fitting a variogram to residuals (see train.spLearner.R).
  2. Use the block size during training of the Ensemble Model (see train.spLearner.R) via the resampling=mlr::makeResampleDesc(method = "CV", blocking.cv=TRUE) argument.

I think the ~ 0.32 R-square is the one you should report. Read more about spatial CV.