Laboratorio-de-Pedometria / spsann-package

Optimization of Spatial Samples via Simulated Annealing
8 stars 5 forks source link

optimization for a finite number of candidate locations #14

Open AlexandreWadoux opened 5 years ago

AlexandreWadoux commented 5 years ago

Is there a way to optimize a design for a finite number of candidate locations? I see this is implemented here: https://github.com/samuel-rosa/spsann/blob/master/R/spJitter.R but not available yet. So far what I do is to modify internally the spJitter function:

#spJitterFinite 
spJitter <-  function (points, candi, x.max, x.min, y.max, y.min, which.point, cellsize) {

       .spJitterCpp <- function(x, y, xmax, xmin, ymax, ymin, idx) {
      .Call('_spsann_spJitterCpp', PACKAGE = 'spsann', x, y, xmax, xmin, ymax, ymin, idx)
    } 

    #if (length(cellsize) == 1) { cellsize <- rep(cellsize, 2) }

    #Get candidate locations using Cpp
    pt1 <- .spJitterCpp(points[, 2:3], candi[, 2:3], x.max, x.min, y.max,
                        y.min, which.point)

    # Get candidate locations
    pt1 <- pt1[pt1 != 0]

    # Select one candidate location
    pt2 <- sample(pt1, 1)

    # Check if it already is in the sample (duplicated)
    dup <- duplicated(c(pt2, points[, 1]))

    # If it already exists, we try to find another point as many times as
    # there are point in the sample. The reason for this choice is that the
    # more points we have, the more likely it is that the candidate point
    # already is included in the sample.
    if (any(dup)) {
      ntry <- 0
      while (any(dup)) {
        pt2 <- sample(pt1, 1)
        dup <- duplicated(c(pt2, points[, 1]))
        ntry <- ntry + 1
        if (ntry == 100) {
          pt2 <- which.point
          break
        }
      }
    }

    res <- points
    res[which.point, ] <- candi[pt2, ]
    return (res)  }
    environment(spJitter) <- asNamespace('spsann')

    assignInNamespace("spJitter", spJitter, ns="spsann")

Note that I have to call .spJitterCpp which uses _spsann_spJitterCpp, while here: https://github.com/samuel-rosa/spsann/blob/master/R/RcppExports.R the same function is called differently (spsann_spJitterCpp is this normal?).

samuel-rosa commented 5 years ago

A bit of reasoning after dealing with the issue.

Now we can specify cellsize = 0 to tell spsann to look for alternative locations for a point ONLY among the center points of the grid cells of candi in the neighborhood. This means that with cellsize = 0 we use a finite set of candidate locations placed on a fine regular grid of points.

When we want to thin -- choose a smaller set of points -- an existing sample configuration, we use a finite set of candidate locations placed on a (generally) coarse irregular grid of points. If the optimization takes place in the feature space, e.g. optimCLHS(), then we need a data frame or matrix with the values of the covariates at each of the existing sample points to feed covars. The existing sample points are used to feed candi and cellsize is set to zero.

We also need the boundaries of the marginal sampling strata of each of the covariates. Currently, these boundaries are computed from covars. However, if covars is a reduced set of data from the entire study area, then it is not appropriate for computing the marginal sampling strata. These need to be computed based on the entire set of covariate values across the entire study area. At the moment, if covars is fed with the entire set of covariate values across the entire study area, spsann issues the following error:

Error: 'candi' and 'covars' must have the same number of rows

There are a few possible solutions:

  1. Create a new function argument strata. The user computes the boundaries of the marginal sampling strata before hand. In this case, candi and covars would have exactly the same number of rows.
  2. Create a new function argument thinning. The user feeds covars with the covariate layers for the entire study area and the boundaries of the marginal sampling strata are computed internally. The new function argument thinning would be used to pass the check of the number of rows in candi and covars, which now would be different.

Because candidate locations are far apart from each other, we have to choose the appropriate values for the maximum and minimum jittering in the x and y coordinates. Perhaps we should allow sample points to freely move around, without (arithmetically) reducing the jittering at the end of each Markov Chain. This means that we have to change the annealing schedule (scheduleSPSANN()).

To be continued...

AlexandreWadoux commented 5 years ago

I am not sure it makes any difference whether candi is a fine regular of coarse irregular grid of points. In both cases, these are spatial points and the user should choose appropriate values for the annealing schedule. For example, y.min and x.min should be greater or equal to the minimum distance between two candidate locations.

In any case, the changes you made with cellsize=0 solve the problems to me.

One last thing: with the change that you made, during the optimization a message is issued when a location is already occupied. While this is in principle interesting, I receive hundreds of these messages and it makes unreadable the progress bar. Would it be an idea to remove this message?