Closed morelju closed 4 months ago
Hi @morelju The error message you're encountering indicates that the covariance matrix is computationally singular, which means it's nearly impossible to invert accurately. This issue often arises, for instance, in spectral datasets due to their high dimensionality and multicollinearity. Let me break down the problem and offer some solutions: (1) Cause of the error:
ellipseParam()
function uses stats::mahalanobis()
, which in turn uses solve(stats::cov(x))
to invert the covariance matrix. x
being your input data.solve()
function uses a tolerance value (tol
) to detect linear dependency in the input matrix. In your case, the default tol = .Machine$double.eps
, which is 1.37309e-48
, has been reached, triggering the error message.(2) Reasons for singularity:
(3) Recommended solutions:
Apply Principal Component Analysis (PCA) to your dataset before using ellipseParam()
. Consider other dimensionality reduction techniques (PLS, ICA, etc...) if appropriate for your project.
Hi @ChristianGoueguel
Thanks a lot for your explanations! I actually got the error message after applying the PCA - I tried to run the ellipseParam()
on the scores. I will try a couple of other dimensionality reduction and see if this solves my problem. I would close the comment once a solution is found, if that's ok with you.
Thanks again,
Julien
Hi again @ChristianGoueguel ,
In the end things work if I drop some PCs before using ellipseParam()
, which was a bit counter-intuitive for me as I would have thought that the k
parameter of ellipseParam()
would handle this. Not sure if I misunderstood it or if this is a bug in your code?
Hi @morelju,
As I mentioned in my previous comment, the error message originates from the covariance matrix inversion using the solve()
function. The resolution you found by dropping a few principal components before using ellipseParam()
confirms that the issue stemmed from principal components in your scores dataset with very low variance. Retaining only the principal components that effectively summarize your dataset is indeed a good practice.
To document and illustrate the error you encountered, I've conducted a few tests:
data("specData", package = "HotellingEllipse")
pca_mod <- specData |>
dplyr::select(tidyselect::where(is.numeric)) |>
FactoMineR::PCA(scale.unit = FALSE, graph = FALSE)
pca_scores <- pca_mod |>
purrr::pluck("ind", "coord") |>
tibble::as_tibble()
print(pca_scores)
# A tibble: 100 × 5
Dim.1 Dim.2 Dim.3 Dim.4 Dim.5
<dbl> <dbl> <dbl> <dbl> <dbl>
1 25306. -10831. -1851. -83.4 -560.
2 -67.3 1137. -2946. 2495. -568.
3 -1822. -22.0 -2305. 1640. -409.
4 -1238. 3734. 4039. -2428. 379.
5 3299. 4727. -888. -1089. 262.
6 5006. -49.5 2534. 1917. -970.
7 -8325. -5607. 960. -3361. 103.
8 -4955. -1056. 2510. -397. -354.
9 -1610. 1271. -2556. 2268. -760.
10 19582. 2289. 886. -843. 1483.
# ℹ 90 more rows
# ℹ Use `print(n = ...)` to see more rows
As you can see, pca_scores
has 5 components and is used as an example in the package README
.
To demonstrate the error conditions, I've added a sixth column, Dim.6
, and conducted two tests:
Dim.6
with zero variance.
n <- nrow(pca_scores)
pca_scores <- pca_scores |> dplyr::mutate(Dim.6 = rep(1, n))
HotellingEllipse::ellipseParam(pca_scores) Error in solve.default(cov, ...) : Lapack routine dgesv: system is exactly singular: U[6,6] = 0
- **Test 2:** `Dim.6` **with near-zero variance.**
```r
tol <- .Machine$double.eps
x <- stats::rnorm(n, mean = 0, sd = sqrt(tol))
pca_scores <- pca_scores |> dplyr::mutate(Dim.6 = x * (sqrt(tol) / stats::sd(x)))
HotellingEllipse::ellipseParam(pca_scores)
Error in solve.default(cov, ...) :
system is computationally singular: reciprocal condition number = 5.2279e-24
Key points to note:
Potential future improvements:
Both tests were conducted using the default parameters, which means the Mahalanobis distance is computed before the selection of the number of component k
; leading to issues when low-variance components are present in the dataset. I could modify the ellipseParam
function so that users specify k
components before calculating the Mahalanobis distance and add a variance threshold or warning message to avoid the risk of including low-variance components that could lead to singularity issues.
If you have any questions or need further clarification, please don't hesitate to ask. Otherwise, I'll proceed to close this issue.
Thank you for bringing this to my attention,
Hi @ChristianGoueguel All good with me now, thanks again for your clarifications!
An updated version (v1.2.0) is available on CRAN.
Hi there, R returns the following when running
ellipseParam
on a spectral dataset:Error in solve.default(cov, ...) : system is computationally singular: reciprocal condition number = 1.37309e-48
As I understood it, this is related to the value of thetol
parameter of themahalanobis
function. My question is: do you have any suggestion on how to adjust thistol
parameter withinellipseParam
? Thanks!Julien