BlasBenito / spatialRF

R package to fit spatial models with Random Forest
https://blasbenito.github.io/spatialRF/
109 stars 16 forks source link

Does "negative spatial correlation" need fitting a spatial model? #13

Open xindyhu opened 2 years ago

xindyhu commented 2 years ago

First of all, thank you very much for developing this package. It is truly an outstanding contribution to the field of spatial data science.

Before I start, I want to mention that my question is less about coding and more about methods so I am not attaching a reproducible example here, but I am more than happy to if it will help you. Just let me know.

I am using this package to train a classification random forest model on a 2365*22 dataset. After rf_tuning(), my model residuals are found to be negatively correlated for the first two distance thresholds: image

Then I went on to train a spatial model but was told by rf_spatial() that "The model residuals are not spatially correlated, there is no need to fit a spatial model". I was puzzled so I looked into the source code for rf_spatial() and found the following line:

model.moran.i <- model$residuals$autocorrelation$per.distance %>% dplyr::arrange(dplyr::desc(moran.i)) %>% dplyr::filter(interpretation == "Positive spatial correlation") Then if model.moran.i has zero rows, which in my case was true; no spatial model is fit.

Could you explain why a negative spatial correlation in the residual is not a reason to fit a spatial model? Thanks!

BlasBenito commented 2 years ago

Thank you for your kind words; I truly appreciate your interest in this package.

About your question, the truth is that there is very little research on the effect of negative autocorrelated residuals on linear models, and even less on random forest models. [This paper](https://www.mdpi.com/2571-905X/2/3/27/htm](https://www.mdpi.com/2571-905X/2/3/27/htm) focuses on such how the study of negative autocorrelation has been neglected by the community.

However, as Pedro Peres-Neto says here, "...future research should investigate whether negative autocorrelation also promotes bias in statistical inference. If that is the case, then the method [he is talking about the MEM method used by default in rf_spatial()] can also accommodate this type of autocorrelation by using the eigenvectors that represent negative autocorrelation in the analysis (i.e., eigenvectors with negative eigenvalues or MI)".

It is not up to me to develop the kind of research Pedro proposes there, but I will be more than happy to implement spatial models for the case of negatively autocorrelated residuals in the next version of spatialRF. I am working on it at the moment, so I guess it will be released before the end of the year.

I will leave this issue open, and will contact you back as soon as there is a working version of rf_spatial() with this feature.

Thank you for your contribution to the package!

Best wishes,

Blas

xindyhu commented 2 years ago

Thank you @BlasBenito , I appreciate your quick reply and the reference, I will check it out.

As a quick follow-up, I tried to force a spatial model on my example by taking out the filtering statement. Moran's I for residuals is still significantly negative but some spatial predictors are among the most important features. I am aware this is not how the model is supposed to be used, just tested it out of curiosity. SDSC2022_moran_non

SDSC2022_var_imp_sp_vs_non

I look forward to the next release of spatialRF.

kangluyao commented 1 year ago

Dear @BlasBenito , thanks for your effort. My question may be a little stupid, could you please tell me what the horizontal dashed line in the Moran’s I plot represents? Thanks!