Some new problems - Githubissues

yancong723 commented 4 years ago

Thank you very much for your previous reply .I , however ,have some new problems. 1、How to understand factor covariates ?For example , How is the correlation/association calculated in optimCORR when covars is factor covariates? 2、How should the optimUSER be used ? Such as using optimUSER to aggregate the objective functions optimCORR and optimMSSD into a single utility function .Can you give me a example ?

samuel-rosa commented 4 years ago

For quantitative variables (numeric or integer), the Pearson correlation coefficient is computed. For qualitative variables (character or factor), the Kramer's V measure of association is used.
I have not used optimUSER for multi-objective optimization yet, so I cannot give you an example. I recommend you to take a look at the internals of optimSPAN to get an idea on how to implement it.

yancong723 commented 4 years ago

In addition , the aim of optimDIST is reproduces the marginal distribution of the covariates about sample . I already browsed some articles about the marginal distribution、the marginal sampling strata、 equal-area and equal-range. I do not ,however ,undersatnd why minimizing the objective function ? The Cambridge dictionary of statistics and Mathematical methods of statistics is long so that I could not find which information is useful to me .

samuel-rosa commented 4 years ago

Think of it as the histogram -- in fact, I plan on changing the function name to optimHIST(). The function returns a sample where the covariates have histograms that are closest to their full histogram over the entire study area (raster data).

yancong723 commented 4 years ago

About the argument of use.coords in optimCORR ,how to understand the bivariate association/correlation between the covariates when covars <- meuse.grid[1:1000, 5] and use.coords = TRUE? Because correlation is between two variables. How do the spatial x- and y-coordinates calculate correlations ?

samuel-rosa commented 4 years ago

I am not sure that I understand your questions.

As said above, if all variables are quantitative (numeric or integer), the Pearson correlation coefficient is computed.

yancong723 commented 4 years ago

When the variable is only one (covars is one variable ) and use.coords = TRUE as in the case , how to calculate the Pearson correlation ? In other words , how understand the argument of use.coords.

samuel-rosa commented 4 years ago

Argument use.coords = TRUE means that the coordinates should be used as variables in the optimization. Then, in your case, you have tree variables. The Pearson correlation matrix is computed in the standard way.

yancong723 commented 4 years ago

I want to know when the coordinates are used as variables in the optimization, how calculate the Pearson correlation ? I care about its working mechanism rather than the details.

samuel-rosa commented 4 years ago

I am not sure if I understand your question. Can you give more details?

yancong723 commented 4 years ago

Because the coordinates are provided in the two-dimensional Euclidean space. For example, x is 181020 and y is 333420.Numerically ,the coordinates is different from other variables .The covars is dist(covars <- meuse.grid[1:1000, 5] ) and the coordinates in your case. Is the Pearson correlation matrix is computed between x 、y and dist? Could it be understood to minimize the correlation between dist and location ?

samuel-rosa commented 4 years ago

I do not know what you mean by "Numerically ,the coordinates is different from other variables". If you add a third variable the you are in a three-dimensional space and so on.

Values of all variables are available at all grid locations (population) -- including x- and y-coordinates. In other words, every grid location has its x- and y-coordinates. The values of these variables are used to compute the population correlation matrix. The objective function is designed so that the sample correlation matrix (a correlation matrix computed using the sample points) is equal to the population matrix. In this way the sample would have the same (linear) correlation structure found in the population (entire grid). So, the coordinates are treated as any other variable.

I hope this helps.

yancong723 commented 4 years ago

Thank you very much for your help ！ There is a question about CORR 、DIST and PPL.How to handle this error?

samuel-rosa commented 4 years ago

Hum... I would need to see your code (and possibly data) to understad what is happening.

yancong723 commented 4 years ago

My code: Data：

samuel-rosa commented 4 years ago

I see that you are working with the meuse.grid dataset. Could you please paste the code here?

yancong723 commented 4 years ago

The meuse.grid is only my definition in the code. Perhaps I should use others. Is there a problem with my code ？Looking forward to your reply ！

samuel-rosa commented 4 years ago

I think that you forgot to pass the weights to MSSD. You are only passing weights to CORR, DIST and PPL.

yancong723 commented 4 years ago

I already recompose the optimSPAN-remove the MSSD

samuel-rosa commented 4 years ago

Do you mean that you created your own function based on optimSPAN?

yancong723 commented 4 years ago

YES! However I have run my code in optimSPAN using original SPSANN,this error also exists .

samuel-rosa commented 4 years ago

OK. I will take a closer look at it.

yancong723 commented 4 years ago

Thank you very much for your help ！ Do you have any suggestions now for my question?Due to time constraints ,I am eager to solve the problem .

samuel-rosa commented 4 years ago

I had limitted time to look deeper into your question dyring the next week. I may have an idea on where the problem is. If all goes fine, I will be able to work on this during this week.

samuel-rosa commented 4 years ago

@yancong723 next time, you MUST to provide the code as follows:

library(rgdal)
data("meuse.grid")
head(meuse.grid)
candi <- meuse.grid[, 1:2]
covars <- meuse.grid[, 3:5]
nadir <- list(sim = 10, seeds = 1:10)
utopia <- list(user = list(DIST = 0, CORR = 0))
schedule <- spsann::scheduleSPSANN(chains = 300, initial.temperature = 5, x.max = 1540, y.max = 2060, x.min = 0, y.min = 0, cellsize = 40)
weights <- list(CORR = 1/6, DIST = 1/6, PPL = 2/3)
set.seed(2001)
res <- spsann::optimSPAN(points = 28, candi = candi, covars = covars, nadir = nadir, plotit = T, use.coords = T, utopia = utopia, schedule = schedule, weights = weights)

samuel-rosa commented 4 years ago

@yancong723, optimSPANN uses four objective functions: PPL, MSSD, DIST, and CORR. So, you need to pass weights and utopia values for all four objective functions. Fix this in your code and try again. If the error persists, provide reproducible code so that I can test it.

yancong723 commented 4 years ago

Thanks ! There are some new problems ### . 1、 The obj=CORR+DIST+PPL+MSSD in this picture,but the weights not like this .I think the obj=1/3CORR+1/3DIST+1/3MSSD. 2、 For the multi-objective combinatorial optimization problem,we use the upper-lower-bound approach to scale the objective functions. Each objective function is between 0 and 1. Why the obj is so large such as obj =20 ? 3、 Each time I run the same code, the result is often different .Is the cause caused by different initial configurations?

samuel-rosa commented 4 years ago

This issue is closed. Please open a nee issue with a full description of the problem and reproducible code.

Laboratorio-de-Pedometria / spsann-package

Some new problems #19