USEPA / spmodel

spmodel: Spatial Statistical Modeling and Prediction in R
https://usepa.github.io/spmodel/
GNU General Public License v3.0
12 stars 0 forks source link

Stratified partitioning #15

Closed damondamondamon closed 6 months ago

damondamondamon commented 6 months ago

Dear spmodel-Team,

thanks a lot for this helpful package!

I was using your package within a inference context to estimate the effect of auxiliary variables on my target variable within a spatial setting. These auxiliary variables are to some extent categorical. Something like:

"Yield ~ Soil_Type (Categorical) + Elevation (Continuous) + etc"

When working with large datasets (n > 10.000 with partition size > 500), I sometimes get the warning message:

At least one partition's inverse covariance matrix is singular. Redjusting using var_adjust = "none". (I guess it should be readjusting?)

While this could be due to some real singularity in the covariance matrix, I rather assume that it occurs due to the rather unbalanced distribution in my categorical variables (e.g. two soil types with imbalanced 90%/10% distribution).

Related questions / suggestions:

  1. What does actually happen, when you show this warning? Do you simply ignore that partitions result and average over all remaining partitions? Or do you simply adjust the corresponding partitions regression to non-spatial?
  2. I would suggest to enable the option of stratified sampling to have all categorical levels represented in each partition even though this case is not covered in your paper; not sure whether this is in conflict with the statistical properties that you present there.

Minor side question: When using spmodel for inference (in my case n = 6.000) and defining the spcov_type as "none" (just as a non-spatial reference) and define local explicitly to "FALSE", I would still assume a routine that is equivalent to lm. Still, the computational time is exceptionally high (lm ~1sec, splm > 500sec). Will try to add replicable example.

michaeldumelle commented 6 months ago

Thanks @damondamondamon for the issue and for the kind words about the software! A few thoughts below:

Please let us know if you have any additional questions!

michaeldumelle commented 6 months ago

@damondamondamon we have improved the efficiency of splm() when there are no random effects (see here).

You can download the development version of spmodel by running

remotes::install_github("USEPA/spmodel", ref = "develop")

This fix will be part of the next CRAN update (the current version on CRAN is 0.5.1).

I will go ahead and close the issue but please reach out if anything else comes up!