Closed jeffwong-nflx closed 6 years ago
I can replicate the error, but I won't have time to dig into this.
Edit: Note that the error doesn't show up until you try to do something with foo
. For example none of these operations work
foo
as.matrix(foo)
2*foo
I have a fix. Your XtX is actually dense, so just operate on a dense matrix. That works.
#include <RcppEigen.h>
// [[Rcpp::depends(RcppEigen)]]
using Eigen::Map;
using Eigen::MatrixXd;
using Eigen::VectorXd;
using Eigen::SparseMatrix;
using Eigen::MappedSparseMatrix;
// [[Rcpp::export]]
Eigen::MatrixXd sparseCrossprod(const Eigen::MappedSparseMatrix<double>& X,
const Eigen::VectorXd& l2penalty) {
Eigen::SparseMatrix<double> XtXa = X.adjoint() * X;
Eigen::MatrixXd XtX(XtXa);
for (int i = 0; i < XtX.rows(); i++) {
double value = XtX.coeff(i, i);
if (value == 0) {
XtX(i,i) = 1;
}
}
return XtX;
}
/***R
require(Matrix)
k <- 2
N <- 100000 * k
p <- 10
density <- .1
X <- cbind(1, rsparsematrix(N, p, density))
l2penalty <- c(0, rep(1, p))
X[,5] <- 0
foo <- sparseCrossprod(X, l2penalty)
foo
*/
R> sourceCpp("/tmp/eigenBug.cpp")
R> require(Matrix)
R> k <- 2
R> N <- 100000 * k
R> p <- 10
R> density <- .1
R> X <- cbind(1, rsparsematrix(N, p, density))
R> l2penalty <- c(0, rep(1, p))
R> X[,5] <- 0
R> foo <- sparseCrossprod(X, l2penalty)
R> foo
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11]
[1,] 200000.0000 114.05012 39.7344 46.6718 0 186.89394 -119.3847 -73.22899 -38.997610 -100.9542 100.273360
[2,] 114.0501 20006.60037 -92.2933 133.1399 0 -58.16418 72.2073 1.51963 30.420171 -37.3681 -55.105421
[3,] 39.7344 -92.29333 19698.4569 -39.9250 0 45.97927 -20.6944 46.58300 30.347564 16.3285 53.736854
[4,] 46.6718 133.13989 -39.9250 19608.6417 0 33.03311 -22.4419 30.85163 68.138594 -49.9748 -21.730008
[5,] 0.0000 0.00000 0.0000 0.0000 1 0.00000 0.0000 0.00000 0.000000 0.0000 0.000000
[6,] 186.8939 -58.16418 45.9793 33.0331 0 19919.16395 94.3603 -1.59755 -2.504021 64.9767 53.199705
[7,] -119.3847 72.20729 -20.6944 -22.4419 0 94.36027 20413.1109 -26.02897 37.652458 11.3807 -11.802387
[8,] -73.2290 1.51963 46.5830 30.8516 0 -1.59755 -26.0290 20178.25477 19.408558 36.7547 -96.265766
[9,] -38.9976 30.42017 30.3476 68.1386 0 -2.50402 37.6525 19.40856 19999.479295 18.7824 0.268916
[10,] -100.9542 -37.36812 16.3285 -49.9748 0 64.97667 11.3807 36.75474 18.782382 20038.1875 64.648955
[11,] 100.2734 -55.10542 53.7369 -21.7300 0 53.19971 -11.8024 -96.26577 0.268916 64.6490 19894.035736
R>
For the other issue (of altering an existing sparse matrix) I have no doubt that the bug in the converter may be real but someone else will have to chase this. I rarely work on sparse matrices with Eigen, sorry.
I can try to take a look.
@jeffwong-nflx
By the way, did you call the makeCompressed()
member function on sparse matrices before returning to R? I remember this is a necessary step. Could you run the examples again by incorporating this simple fix?
Bingo! Nice call, @yixuan -- the following works:
// [[Rcpp::export]]
Eigen::SparseMatrix<double> sparseCrossprod(const Eigen::MappedSparseMatrix<double>& X,
const Eigen::VectorXd& l2penalty) {
Eigen::SparseMatrix<double> XtX = X.adjoint() * X;
for (int i = 0; i < XtX.rows(); i++) {
double value = XtX.coeff(i, i);
if (value == 0) {
XtX.coeffRef(i,i) = 1;
}
}
XtX.makeCompressed();
return XtX;
}
R> sourceCpp("/tmp/eigenBug.cpp")
R> require(Matrix)
R> k <- 2
R> N <- 100000 * k
R> p <- 10
R> density <- .1
R> X <- cbind(1, rsparsematrix(N, p, density))
R> l2penalty <- c(0, rep(1, p))
R> X[,5] <- 0
R> foo <- sparseCrossprod(X, l2penalty)
R> foo
11 x 11 sparse Matrix of class "dgCMatrix"
[1,] 200000.0000 17.70349 52.36093 70.21790 . -138.68631 76.668894 -129.41183 303.286508 -105.66109 -64.2147
[2,] 17.7035 20203.49840 -79.27334 -2.39245 . 15.80165 -1.632720 8.09740 -3.995931 70.14863 59.1109
[3,] 52.3609 -79.27334 20232.14625 8.39644 . -4.49489 15.051420 -43.53782 32.662688 11.52742 52.5977
[4,] 70.2179 -2.39245 8.39644 20297.76109 . -50.51680 1.547393 -99.22284 21.321758 39.83611 -25.2738
[5,] . . . . 1 . . . . . .
[6,] -138.6863 15.80165 -4.49489 -50.51680 . 19745.70799 72.743912 -5.07824 -1.612627 -6.12968 31.4983
[7,] 76.6689 -1.63272 15.05142 1.54739 . 72.74391 19922.147563 -24.17279 0.727478 -4.59683 -18.1567
[8,] -129.4118 8.09740 -43.53782 -99.22284 . -5.07824 -24.172793 20477.37783 -15.652919 -84.47163 50.5008
[9,] 303.2865 -3.99593 32.66269 21.32176 . -1.61263 0.727478 -15.65292 19745.843853 -20.63223 -102.7115
[10,] -105.6611 70.14863 11.52742 39.83611 . -6.12968 -4.596833 -84.47163 -20.632233 19852.47265 -11.0920
[11,] -64.2147 59.11089 52.59771 -25.27383 . 31.49835 -18.156698 50.50083 -102.711487 -11.09198 19786.9951
R>
Thanks, that works! This is an extremely subtle issue inside Eigen and only occurs when you insert new values into the sparse matrix. Would you consider adding this to the RcppEigen vignette to prevent future users from making this error?
The results of Eigen's operations always produces compressed sparse matrices. On the other hand, the insertion of a new element into a SparseMatrix converts this later to the uncompressed mode.
@yixuan Is there a reason we can't automagically do that when wrap()
is called ?
I think part of the reason is that wrap()
receives a constant reference to the sparse matrix, but makeCompressed()
modifies the object itself.
Eigen documents here that
A call to the function makeCompressed() turns the matrix into the standard compressed format compatible with many library.
So I would say that the responsibility is mostly left to the user.
Sounds good to me.
And I think with that we can close this as well.
I am using RcppEigen to compute the matrix X^T X + lambda I (as in ridge regression) where X is a sparse input matrix. I am able to compute X^T X in Eigen, however there is an edge case for adding lambda I. This step modifies the diagonal of the matrix X^T X. If the diagonal had a 0 in it, then you cannot add lambda. Under the hood, X^T * X returns a sparse matrix, so if there is a 0 on the diagonal there is no memory allocated that I can use to add.
In Eigen, the appropriate way to modify a value in a sparse matrix which may not exist yet is to use .insert. This produces valid Eigen code, but for some reason when the resulting matrix is brought back to R as a dgCMatrix I get the error:
Error in validObject(x) : invalid class “dgCMatrix” object: last element of slot p must match length of slots i and x
Here is a reproducible example. I believe the relevant wrapper code in RcppEigen is here, though I don't know what the right solution is.