Open AlexandreWadoux opened 5 years ago
A bit of reasoning after dealing with the issue.
Now we can specify cellsize = 0
to tell spsann to look for alternative locations for a point ONLY among the center points of the grid cells of candi
in the neighborhood. This means that with cellsize = 0
we use a finite set of candidate locations placed on a fine regular grid of points.
When we want to thin -- choose a smaller set of points -- an existing sample configuration, we use a finite set of candidate locations placed on a (generally) coarse irregular grid of points. If the optimization takes place in the feature space, e.g. optimCLHS()
, then we need a data frame or matrix with the values of the covariates at each of the existing sample points to feed covars
. The existing sample points are used to feed candi
and cellsize
is set to zero.
We also need the boundaries of the marginal sampling strata of each of the covariates. Currently, these boundaries are computed from covars
. However, if covars
is a reduced set of data from the entire study area, then it is not appropriate for computing the marginal sampling strata. These need to be computed based on the entire set of covariate values across the entire study area. At the moment, if covars
is fed with the entire set of covariate values across the entire study area, spsann issues the following error:
Error: 'candi' and 'covars' must have the same number of rows
There are a few possible solutions:
strata
. The user computes the boundaries of the marginal sampling strata before hand. In this case, candi
and covars
would have exactly the same number of rows.thinning
. The user feeds covars
with the covariate layers for the entire study area and the boundaries of the marginal sampling strata are computed internally. The new function argument thinning
would be used to pass the check of the number of rows in candi
and covars
, which now would be different.Because candidate locations are far apart from each other, we have to choose the appropriate values for the maximum and minimum jittering in the x and y coordinates. Perhaps we should allow sample points to freely move around, without (arithmetically) reducing the jittering at the end of each Markov Chain. This means that we have to change the annealing schedule (scheduleSPSANN()
).
To be continued...
I am not sure it makes any difference whether candi
is a fine regular of coarse irregular grid of points. In both cases, these are spatial points and the user should choose appropriate values for the annealing schedule. For example, y.min
and x.min
should be greater or equal to the minimum distance between two candidate locations.
In any case, the changes you made with cellsize=0
solve the problems to me.
One last thing: with the change that you made, during the optimization a message is issued when a location is already occupied. While this is in principle interesting, I receive hundreds of these messages and it makes unreadable the progress bar. Would it be an idea to remove this message?
Is there a way to optimize a design for a finite number of candidate locations? I see this is implemented here: https://github.com/samuel-rosa/spsann/blob/master/R/spJitter.R but not available yet. So far what I do is to modify internally the
spJitter
function:Note that I have to call
.spJitterCpp
which uses_spsann_spJitterCpp
, while here: https://github.com/samuel-rosa/spsann/blob/master/R/RcppExports.R the same function is called differently (spsann_spJitterCpp
is this normal?).