Bug in Eigen::SparseMatrix -> dgCMatrix conversion

jeffwong-nflx commented 6 years ago

I am using RcppEigen to compute the matrix X^T X + lambda I (as in ridge regression) where X is a sparse input matrix. I am able to compute X^T X in Eigen, however there is an edge case for adding lambda I. This step modifies the diagonal of the matrix X^T X. If the diagonal had a 0 in it, then you cannot add lambda. Under the hood, X^T * X returns a sparse matrix, so if there is a 0 on the diagonal there is no memory allocated that I can use to add.

In Eigen, the appropriate way to modify a value in a sparse matrix which may not exist yet is to use .insert. This produces valid Eigen code, but for some reason when the resulting matrix is brought back to R as a dgCMatrix I get the error:

Error in validObject(x) : invalid class “dgCMatrix” object: last element of slot p must match length of slots i and x

Here is a reproducible example. I believe the relevant wrapper code in RcppEigen is here, though I don't know what the right solution is.

#include <Rcpp.h>
#include <RcppEigen.h>
// [[Rcpp::depends(RcppEigen)]]

using Eigen::Map;
using Eigen::MatrixXd;
using Eigen::VectorXd;
using Eigen::SparseMatrix;
using Eigen::MappedSparseMatrix;
using namespace Rcpp;
using namespace Eigen;

// [[Rcpp::export]]
Eigen::SparseMatrix<double> sparseCrossprod(const Eigen::MappedSparseMatrix<double>& X,
                                            const Eigen::VectorXd& l2penalty) {
  SparseMatrix<double> XtX = X.adjoint() * X;

  // R won't understand the returned matrix
  for (int i = 0; i < XtX.rows(); i++) {
    double value = XtX.coeff(i, i);
    Rcout << value << std::endl;
    if (value == 0) {
      XtX.insert(i, i) = 1;
    }
  }

  // This will crash
  // XtX += VectorXd::Ones(XtX.rows()).asDiagonal();

  // This will crash if there is a 0 on the diagonal
  //VectorXd diag(XtX.diagonal());
  //Rcout << diag << std::endl;
  //XtX.diagonal() = diag + l2penalty;

  return XtX;
}

/***R
require(Matrix)

k = 2
N = 100000 * k
p = 10
density = .1

X = cbind(1, rsparsematrix(N, p, density))
l2penalty = c(0, rep(1, p))

X[,5] = 0

foo = sparseCrossprod(X, l2penalty)
*/

eddelbuettel commented 6 years ago

I can replicate the error, but I won't have time to dig into this.

jeffwong-nflx commented 6 years ago

Edit: Note that the error doesn't show up until you try to do something with foo. For example none of these operations work

foo
as.matrix(foo)
2*foo

eddelbuettel commented 6 years ago

I have a fix. Your XtX is actually dense, so just operate on a dense matrix. That works.

Code

#include <RcppEigen.h>
// [[Rcpp::depends(RcppEigen)]]

using Eigen::Map;
using Eigen::MatrixXd;
using Eigen::VectorXd;
using Eigen::SparseMatrix;
using Eigen::MappedSparseMatrix;

// [[Rcpp::export]]
Eigen::MatrixXd sparseCrossprod(const Eigen::MappedSparseMatrix<double>& X,
                                const Eigen::VectorXd& l2penalty) {
  Eigen::SparseMatrix<double> XtXa = X.adjoint() * X;
  Eigen::MatrixXd XtX(XtXa);
  for (int i = 0; i < XtX.rows(); i++) {
    double value = XtX.coeff(i, i);
    if (value == 0) {
      XtX(i,i) = 1;
    }
  }
  return XtX;
}

/***R
require(Matrix)
k <- 2
N <- 100000 * k
p <- 10
density <- .1

X <- cbind(1, rsparsematrix(N, p, density))
l2penalty <- c(0, rep(1, p))
X[,5] <- 0

foo <- sparseCrossprod(X, l2penalty)
foo
*/

Output

R> sourceCpp("/tmp/eigenBug.cpp")

R> require(Matrix)

R> k <- 2

R> N <- 100000 * k

R> p <- 10

R> density <- .1

R> X <- cbind(1, rsparsematrix(N, p, density))

R> l2penalty <- c(0, rep(1, p))

R> X[,5] <- 0

R> foo <- sparseCrossprod(X, l2penalty)

R> foo
             [,1]        [,2]       [,3]       [,4] [,5]        [,6]       [,7]        [,8]         [,9]      [,10]        [,11]
 [1,] 200000.0000   114.05012    39.7344    46.6718    0   186.89394  -119.3847   -73.22899   -38.997610  -100.9542   100.273360
 [2,]    114.0501 20006.60037   -92.2933   133.1399    0   -58.16418    72.2073     1.51963    30.420171   -37.3681   -55.105421
 [3,]     39.7344   -92.29333 19698.4569   -39.9250    0    45.97927   -20.6944    46.58300    30.347564    16.3285    53.736854
 [4,]     46.6718   133.13989   -39.9250 19608.6417    0    33.03311   -22.4419    30.85163    68.138594   -49.9748   -21.730008
 [5,]      0.0000     0.00000     0.0000     0.0000    1     0.00000     0.0000     0.00000     0.000000     0.0000     0.000000
 [6,]    186.8939   -58.16418    45.9793    33.0331    0 19919.16395    94.3603    -1.59755    -2.504021    64.9767    53.199705
 [7,]   -119.3847    72.20729   -20.6944   -22.4419    0    94.36027 20413.1109   -26.02897    37.652458    11.3807   -11.802387
 [8,]    -73.2290     1.51963    46.5830    30.8516    0    -1.59755   -26.0290 20178.25477    19.408558    36.7547   -96.265766
 [9,]    -38.9976    30.42017    30.3476    68.1386    0    -2.50402    37.6525    19.40856 19999.479295    18.7824     0.268916
[10,]   -100.9542   -37.36812    16.3285   -49.9748    0    64.97667    11.3807    36.75474    18.782382 20038.1875    64.648955
[11,]    100.2734   -55.10542    53.7369   -21.7300    0    53.19971   -11.8024   -96.26577     0.268916    64.6490 19894.035736
R>

eddelbuettel commented 6 years ago

For the other issue (of altering an existing sparse matrix) I have no doubt that the bug in the converter may be real but someone else will have to chase this. I rarely work on sparse matrices with Eigen, sorry.

yixuan commented 6 years ago

I can try to take a look.

yixuan commented 6 years ago

@jeffwong-nflx By the way, did you call the makeCompressed() member function on sparse matrices before returning to R? I remember this is a necessary step. Could you run the examples again by incorporating this simple fix?

eddelbuettel commented 6 years ago

Bingo! Nice call, @yixuan -- the following works:

// [[Rcpp::export]]
Eigen::SparseMatrix<double> sparseCrossprod(const Eigen::MappedSparseMatrix<double>& X,
                                            const Eigen::VectorXd& l2penalty) {
  Eigen::SparseMatrix<double> XtX = X.adjoint() * X;
  for (int i = 0; i < XtX.rows(); i++) {
    double value = XtX.coeff(i, i);
    if (value == 0) {
      XtX.coeffRef(i,i) = 1;
    }
  }
  XtX.makeCompressed();
  return XtX;
}

Output:

R> sourceCpp("/tmp/eigenBug.cpp")

R> require(Matrix)

R> k <- 2

R> N <- 100000 * k

R> p <- 10

R> density <- .1

R> X <- cbind(1, rsparsematrix(N, p, density))

R> l2penalty <- c(0, rep(1, p))

R> X[,5] <- 0

R> foo <- sparseCrossprod(X, l2penalty)

R> foo
11 x 11 sparse Matrix of class "dgCMatrix"

 [1,] 200000.0000    17.70349    52.36093    70.21790 .  -138.68631    76.668894  -129.41183   303.286508  -105.66109   -64.2147
 [2,]     17.7035 20203.49840   -79.27334    -2.39245 .    15.80165    -1.632720     8.09740    -3.995931    70.14863    59.1109
 [3,]     52.3609   -79.27334 20232.14625     8.39644 .    -4.49489    15.051420   -43.53782    32.662688    11.52742    52.5977
 [4,]     70.2179    -2.39245     8.39644 20297.76109 .   -50.51680     1.547393   -99.22284    21.321758    39.83611   -25.2738
 [5,]      .          .           .           .       1     .           .            .           .            .           .     
 [6,]   -138.6863    15.80165    -4.49489   -50.51680 . 19745.70799    72.743912    -5.07824    -1.612627    -6.12968    31.4983
 [7,]     76.6689    -1.63272    15.05142     1.54739 .    72.74391 19922.147563   -24.17279     0.727478    -4.59683   -18.1567
 [8,]   -129.4118     8.09740   -43.53782   -99.22284 .    -5.07824   -24.172793 20477.37783   -15.652919   -84.47163    50.5008
 [9,]    303.2865    -3.99593    32.66269    21.32176 .    -1.61263     0.727478   -15.65292 19745.843853   -20.63223  -102.7115
[10,]   -105.6611    70.14863    11.52742    39.83611 .    -6.12968    -4.596833   -84.47163   -20.632233 19852.47265   -11.0920
[11,]    -64.2147    59.11089    52.59771   -25.27383 .    31.49835   -18.156698    50.50083  -102.711487   -11.09198 19786.9951
R>

jeffwong-nflx commented 6 years ago

Thanks, that works! This is an extremely subtle issue inside Eigen and only occurs when you insert new values into the sparse matrix. Would you consider adding this to the RcppEigen vignette to prevent future users from making this error?

The results of Eigen's operations always produces compressed sparse matrices. On the other hand, the insertion of a new element into a SparseMatrix converts this later to the uncompressed mode.

eddelbuettel commented 6 years ago

@yixuan Is there a reason we can't automagically do that when wrap() is called ?

yixuan commented 6 years ago

I think part of the reason is that wrap() receives a constant reference to the sparse matrix, but makeCompressed() modifies the object itself.

Eigen documents here that

A call to the function makeCompressed() turns the matrix into the standard compressed format compatible with many library.

So I would say that the responsibility is mostly left to the user.

eddelbuettel commented 6 years ago

Sounds good to me.

And I think with that we can close this as well.

RcppCore / RcppEigen