MarioniLab / scran

Clone of the Bioconductor repository for the scran package.
https://bioconductor.org/packages/devel/bioc/html/scran.html
39 stars 23 forks source link

Error in denoisePCA : NA indices are not supported for sparse matrix #74

Closed ljouneau closed 3 years ago

ljouneau commented 3 years ago

I encountered an issue while trying to denoise a PCA on a SingleCellExperiment object. Here is the error:

denoised.pca=denoisePCA(sce.filtered.hvg,technical=modelGeneVariance[highly_variables_genes,]) Error in intI(i, n = x@Dim[1], dn[[1]], give.dn = FALSE) : 'NA' indices are not (yet?) supported for sparse Matrices

I tried to pass through this error using getDenoisedPCs and transforming my sparse matrix in regular matrix:

logc=as.matrix(assay(sce.filtered.hvg,"logcounts")) getDenoisedPCs(logc,technical=mgv[hvg,]) Error in (function (A, nv = 5, nu = nv, maxit = 1000, work = nv + 7, reorth = TRUE, : la procédure BLAS/LAPACK 'DLASCL' a produit le code d'erreur -4

This new error seems quite related to Issue #15 (https://github.com/MarioniLab/scran/issues/15) So I followed LTLA advices: (i) ensure that all values in the matrix y are finite,

sum(is.infinite(logc)) [1] 0 sum(is.na(logc)) [1] 0 (ii) check that y has more than 100 columns/rows dim(logc) [1] 2783 4542 (iii) confirm that there are no all-zero rows or columns sum(apply(logc,1,sum)==0) [1] 0 sum(apply(logc,2,sum)==0) [1] 0

If I run the PCA on log counts matrix using a different algorithm (scater::runPCA or PCA function of FactoMineR package), it works fine.

If I used BSPARAM to avoid getDenoisedPCs using Irlba algorithm, I get another error:

getDenoisedPCs(logc,technical=mgv[hvg,],BSPARAM=ExactParam()) Error in svd(x, nu = nu, nv = nv) : infinite or missing values in 'x'

according to https://stackoverflow.com/questions/21423375/r-svd-function-infinite-or-missing-values-in-x I checked I have no NA values after scaling of columns:

scale_logc=scale(logc) sum(is.na(scale_logc)) [1] 0

But when I sum the counts of the columns, I have one column with a sum of 0:

sum(apply(scale_logc,2,sum)==0) [1] 1

Although this column seems perfectly normal:

strange_column=which(apply(scale_logc,2,sum)==0) summary(scale_logc[,strange_column]) Min. 1st Qu. Median Mean 3rd Qu. Max. -0.1917 -0.1917 -0.1917 0.0000 -0.1917 11.6719

But it still doesn't work if I try to run again getDenoisedPCs without this cell:

getDenoisedPCs(logc[,-strange_column],technical=mgv[hvg,],BSPARAM=ExactParam()) Error in svd(x, nu = nu, nv = nv) : infinite or missing values in 'x'

So, this cell does not seem to be the reason of my problems.

I have no more ideas for a workaround or have some clue on this error ... Thanks if someone has some ideas about it. Best regards

LTLA commented 3 years ago

Session information?

ljouneau commented 3 years ago

Ah, Yes !!! Sorry

SessionInfo.txt

LTLA commented 3 years ago

This call:

denoised.pca=denoisePCA(sce.filtered.hvg,technical=modelGeneVariance[highly_variables_genes,])

Does not look correct. I'm guessing it should be something more like:

denoised.pca <- denoisePCA(sce.filtered.hvg,technical=modelGeneVariance, subset.row=highly_variables_genes)

x= and technical= should have the same number of rows, as desecribed in ?denoisePCA.

ljouneau commented 3 years ago

I thought it was the case because sce.filtered.hvg only contains the highly variable genes I identified, but in fact I added in sce object a gene I wanted to keep (although not identified as highly variable gene) and forgot to add it also to highly_variable_genes vector. Now it works fine ! Thank you.

LTLA commented 3 years ago

FYI, added some extra defensive checks against misspecified inputs in 4e6f5d62c14d71bc67fc75bf61fe9ef1a650da41.