Closed maddyduran closed 3 years ago
Hello,
There is currently no proper way to do that. There are a few hacky ones but they lack sound statistical grounding.
To expand a bit, the main problem is that you need to compute the latent position of the newdata into the latent space (the so called variational parameters) while keeping the model parameters before applying the rotation matrix. For the PCA case, the current functions optimize both the variational parameters and the model parameters but not one or the other. However, It should be fairly doable as we do something similar for other analyses (PLN, PLNLDA) so I'll give a try and keep you updated.
Hello,
I pushed a new development version of the package with the requested feature on the dev
branch. You can install it on your computer with
remotes::install_github("pln-team/PLNmodels@dev")
And here is a short demo to show how it works (it's not super user-friendly yet, I'll work on that before inclusion in the next CRAN release but you may not want to wait until then)
library(PLNmodels)
library(ggplot2)
data("oaks")
## Learning set for PCA
learn <- sample(x = 1:nrow(oaks), size = ceiling(0.8*nrow(oaks)), replace = FALSE)
## newdata
newdata <- oaks[-learn, ]
## Compute PCA with 15 latent dimensions
myPCAmodels <- PLNPCA(Abundance ~ 1 + offset(log(Offset)), data = oaks[learn, ], ranks = 15)
myPCA <- myPCAmodels$getBestModel()
## Predict centered latent position of newdata
model <- model.frame(Abundance ~ 1 + offset(log(Offset)), data = newdata)
covariates <- model.matrix(terms(model), model)
offsets <- model.offset(model)
responses <- model[[1L]]
newM <- myPCA$VEstep(covariates = covariates, offsets = offsets, responses = responses)$M
## Compute scores positions in PCA space
new_latent_pos <- t(myPCA$model_par$B %*% t(newM)) %>% scale(center = TRUE, scale = FALSE)
new_scores <- new_latent_pos %*% myPCA$rotation
## Add dimnames and metadata for easier plotting.
dimnames(new_scores) <- list(rownames(newdata), paste0("PC", 1:ncol(new_scores)))
new_scores <- cbind(newdata[ , c("Sample", "tree")], as.data.frame(new_scores))
## fviz_pca for original data +
p <- factoextra::fviz_pca_ind(myPCA, habillage = oaks[learn, "tree"], geom = "point", alpha.ind = 0.5, )
p +
geom_point(data = new_scores, aes(x = PC1, y = PC2, color = tree)) +
geom_text(data = new_scores, aes(x = PC1, y = PC2, color = tree, label = Sample), vjust = 0)
with the following results (original data with transparent colors, newdata with labels and solid colors)
Hi,
Just to let you know, the code is now slightly more user-friendly and available in the master
branch of the github repo, available via:
remotes::install_github("pln-team/PLNmodels")
And the new code to project newdata into PCA space is:
library(PLNmodels)
library(ggplot2)
data("oaks")
## Learning set for PCA (80% of the samples)
set.seed(20210421)
learn <- sample(x = 1:nrow(oaks), size = ceiling(0.8*nrow(oaks)), replace = FALSE)
## newdata (remaining 20%)
newdata <- oaks[-learn, ]
## Compute PCA with 15 latent dimensions
myPCAmodels <- PLNPCA(Abundance ~ 1 + offset(log(Offset)), data = oaks[learn, ], ranks = 15)
myPCA <- myPCAmodels$getBestModel()
## Project newdata in PCA space and add metadata for easier plotting
scores <- myPCA$project(newdata)
scores <- cbind(newdata[ , c("Sample", "tree")], as.data.frame(scores))
## Plot original and newdata
p <- factoextra::fviz_pca_ind(myPCA, habillage = oaks[learn, "tree"], geom = "point", alpha.ind = 0.5)
p + geom_point(data = scores, aes(x = PC1, y = PC2, color = tree)) +
geom_text(data = scores, aes(x = PC1, y = PC2, color = tree, label = Sample), vjust = 0)
Is there currently a way to predict new data into the PCA space returned by PLNPCA? I can find a rotation matrix but I can't seem to identify a center or scaling vector.