PhilBoileau / scPCA

A toolbox for sparse contrastive principal component analysis
https://bioconductor.org/packages/release/bioc/html/scPCA.html
Other
12 stars 1 forks source link

Reverse cPCA and scPCA #55

Open PietroD opened 4 years ago

PietroD commented 4 years ago

First of all, this is not an issue. Thanks for developing the package and implementing cPCA in Bioconductor.

The question is, what if I reverse cPCA (or scPCA) by doing:

newmat <- t(t(cpca$x %*% t(cpca$rotation))) # if center and scale are set to FALSE

Is this correct? Is it correct to say that this way I am obtaining the "corrected" matrix with the "subtracted" background? Many thanks

PhilBoileau commented 4 years ago

Hi @PietroD! Thanks for using scPCA.

cpca$x is defined as X %*% cpca$rotation, where X is the n by p data matrix which has been optionally centered and scaled. cpca$x %*% t(cpca$rotation) therefore produces a low-rank, non-corrected approximation of X. This approximation is worse than the one produced using the eigenvectors of the SVD of X. Perhaps there is a way of obtaining this corrected matrix, but I'll have to think about how to compute it some more.

PietroD commented 4 years ago

Ok I see, thanks. So what would the best option be to work with the corrected matrix? More in general, what to do after cPCA and scPCA? I would like to be able to explore the corrected matrix. What about regressing the top cPCs from the target matrix?

chisin commented 4 years ago

Hello, I am dealing with the same issue (trying to correct my dataset for a very strong individual/physiological signature). I am very much interested in any further development!

PhilBoileau commented 4 years ago

As of right now, there isn't a way to work with a corrected X matrix, @PietroD . Generally, I would avoid regressing out these contrastive principal components from the target matrix, as these principal components are supposed to contain (a portion of) the signal of interest.

Instead, you can work with the rotated matrix given by cpca$x. This matrix is made up of the contrastive principal components. They can be used in place of the regular principal components generated by prcomp() or princomp() for exploratory purposes, like visualizations and clustering. For example, you might pass cpca$x to UMAP or t-SNE. You can subsequently use these embeddings for clustering. We've also had some success using cPCA/scPCA to remove cell cycle effects in scRNA-seq data. You can find examples here (17.5.4), here, and here (results sections).

I'm happy to discuss the details of your particular analysis in more detail so that I might provide better guidance. If you'd rather not talk about here, feel free to send me an email.

PhilBoileau commented 4 years ago

Follow-up: @PietroD and I have discussed this more offline. After talking it over with my supervisor and collaborator, we think that this might be possible. We'll run some tests, and will implement this functionality if it works.