Open tiemvanderdeure opened 1 month ago
Thanks @tiemvanderdeure for posing this interesting question. I'm trying to understand the required interface points better but have not done spatial resampling before. A ResamplingStrategy
can have parameters. Is there a reason the "point location" cannot be one of these? Or are you saying it is needed by fit
(in which case it is a hyperparameter??). Could you say a little more on this point?
It's wouldn't be needed by fit
, only by evaluate!
.
In my field, observations might be locations where a species was/wasn't found. One then extracts information about these points, like climate, land use, distance to a road, etc, and fits a model based on these. The spatial resampling is used to make sure the model learned something about the species and not just the random spatial patterns.
So every row in X
would have a point location, and a spatial resampling strategy would use these locations, e.g. to construct a grid and cross-validate grid cells instead of observations.
If points are a parameter in the ResamplingStrategy
then it could only be used for one particular X
and y
, which defeats the purpose a little bit.
But the more I think about it, the more I realize that this might require quite a lot of changes to the interface to work.
In my field (ecology/species distribution modelling) it is very common to use spatial resampling, and I've written some spatial
ResamplingStrategy
s, such as spatial cross-validation, and am considering where to share that code. I'm considering to either:The problem with the last option is that right now it's not really possible to pass additional information (such as the point location) of data to
machine
. I'm hacking around this in SDM.jl by callingtrain_test_pairs
directly.I would like to hear what others think about this?