Closed damondamondamon closed 8 months ago
Thanks @damondamondamon for the issue and for the kind words about the software! A few thoughts below:
index
to local
, and this is what I recommend you do to avoid the message. By default, spmodel
uses kmeans()
to select the indexes, so you could do this on your own, and if you have a partition without both soil types, either select a new set of partitions or rearrange a few observations. Also see my answer to "Related question 2" below.var_adjust = "theoretical"
to local
) to compute the variances of the explanatory variable slopes estimates. Notice that there are $V_{i,i}^{-1}$ terms, which end up being singular when both soil types are not observed in the corresponding $X_i$ matrix. When var_adjust = "none"
to local
, only $T{xx}^{-1}$ from Equation 13 is used to compute the variances of the explanatory variable slope estimates. The important takeaway is that we use a "shortcut" to fit the model to large spatial data and then must use Equation 13 to get the theoretically correct variances of the explanatory variable slope estimates. If we don't apply this adjustment (or we can't because one of the $V{i,i}$ terms are singular), the explanatory variable slope estimates tend to be a little too small, leading to narrower confidence intervals. Regardless of the var_adjust
type, the spatial covariance parameter estimates and explanatory variable estimates are the same (all that changes is the variance of the explanatory variable slope estimates).spmodel
in the near future, as there are already some pieces of software in the tidymodels ecosystem (rsample and spatialsample) designed for this. You can use these pieces of software to partition your data and use the partitions to create the vector that is the index
argument to local
.random
argument to splm()
, the model should be equivalent to lm()
and fit much faster. However, this fix may take a bit of time to implement as we need to create a custom routine for it that works with everything else. We did not do this originally because we believed people would primarily use lm()
to fit non-spatial models, but we now see the utility in making model comparisons between spatial and non-spatial models directly using spmodel
's existing architecture, which necessitates fitting a model with splm(..., spcov_type = "none")
.Please let us know if you have any additional questions!
@damondamondamon we have improved the efficiency of splm()
when there are no random effects (see here).
You can download the development version of spmodel by running
remotes::install_github("USEPA/spmodel", ref = "develop")
This fix will be part of the next CRAN update (the current version on CRAN is 0.5.1).
I will go ahead and close the issue but please reach out if anything else comes up!
Dear spmodel-Team,
thanks a lot for this helpful package!
I was using your package within a inference context to estimate the effect of auxiliary variables on my target variable within a spatial setting. These auxiliary variables are to some extent categorical. Something like:
"Yield ~ Soil_Type (Categorical) + Elevation (Continuous) + etc"
When working with large datasets (n > 10.000 with partition size > 500), I sometimes get the warning message:
At least one partition's inverse covariance matrix is singular. Redjusting using var_adjust = "none".
(I guess it should be readjusting?)While this could be due to some real singularity in the covariance matrix, I rather assume that it occurs due to the rather unbalanced distribution in my categorical variables (e.g. two soil types with imbalanced 90%/10% distribution).
Related questions / suggestions:
Minor side question: When using spmodel for inference (in my case n = 6.000) and defining the spcov_type as "none" (just as a non-spatial reference) and define local explicitly to "FALSE", I would still assume a routine that is equivalent to lm. Still, the computational time is exceptionally high (lm ~1sec, splm > 500sec). Will try to add replicable example.