bnprks / BPCells

Scaling Single Cell Analysis to Millions of Cells
https://bnprks.github.io/BPCells
Other
167 stars 17 forks source link

`svds` behaves differently when `storage_order` is "row" #55

Open ycli1995 opened 1 year ago

ycli1995 commented 1 year ago

Hi, @bnprks. I followed the BPCells tutorials but computed SVD using svds() instead of irlba(). For example, in the RNA part:

svd <- svds(mat_norm, k=50)
pca <- multiply_cols(svd$v, svd$d)
cat(sprintf("PCA dimensions: %s\n", toString(dim(pca))))

I got the same results as irlba:

PCA dimensions: 2768, 50

However, when it came to ATAC's svd_atac, I got this:

svd_atac <- svds(mat_lsi_norm, 10)
pca_atac <- multiply_cols(svd_atac$v, svd_atac$d)
cat(sprintf("PCA dimensions: %s\n", toString(dim(pca_atac))))
PCA dimensions: 211807, 10

In other word, it seemed that the svd_atac$v and svd_atac$u were actually swapped, compared to RNA's svd$v and svd$u. When I check the input matrix, I found storage_order(mat_lsi_norm) was "row" while storage_order(mat_norm) was "col" as most scRNA-seq matrix in R would be.

Do you think BPCells::svds() should give u and v consistently regardless of storage_order, or should we just leave the users to check storage_order after calling it?

bnprks commented 1 year ago

Thanks for pointing this out -- definitely a bug in BPCells and not intended behavior. This should be fixed with the above commit.

The underlying cause here is that the C++ code treats everything as a column-major matrix, so row-major matrices need some special treatment to ensure conversion works properly