Closed ElouanBethuel closed 1 year ago
Hi Elouan,
The difference in the clustering outcome occurs since we apply an additional low variance filter before performing the clustering. This is essentially controlled by the 'sd.filter' argument which is 0.5 by default, essentially removing all bases with less than 50% deviation from the standard deviation. You may check the code here (, https://github.com/al2na/methylKit/blob/master/R/clusterSamples.R#L286-L315 ).
I hope this answers your question.
Best, Alex
ElouanBethuel @.***> schrieb am Fr., 31. März 2023, 17:55:
Hello, First, thank you for your package, I have been using it for a few months and it's so useful. However, I encountered a problem with the function clusterSamples(). Indeed with my data, when I make a dendrogram is not consistent with the PCA and and I don't understand the reason.
I wrote a code in R to compare the function with the factoextra function and I don't get the same result. If you want to look this I leave you my code and the methylBase Object that I use. (fu.meth.zip)
Have a nice week-end, Elouan Bethuel
#####################################################################
load packages :
library(methylKit) library(FactoMineR) library(factoextra)
load fu.meth object : fu.meth <- readRDS( file = "fu.meth")
test PCA and clustering with methylkit functions :
par(mfrow = c(1,2))pca_m <- PCASamples(fu.meth, screeplot=FALSE, obj.return = TRUE)hcu_m <- clusterSamples(fu.meth, dist="correlation", method="ward", plot=TRUE)
test PCA and clustering with FactoMineR functions :
fu.meth_perc <- percMethylation(fu.meth)fu.meth_perc_t <- data.frame(t(fu.meth_perc)) pca <- PCA(fu.meth_perc_t, ncp = 5, graph = FALSE, scale.unit = TRUE) hcu <- HCPC(pca, method="ward", iter.max= 3, nb.clust = -1 , graph = FALSE)
par(mfrow = c(1,2)) fviz_pca_ind(pca) fviz_dend(hcu) ##########################################################################
My methylBase object : fu.meth.zip https://github.com/al2na/methylKit/files/11123665/fu.meth.zip
— Reply to this email directly, view it on GitHub https://github.com/al2na/methylKit/issues/279, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADK7JD4BXVSYCJM2HTTHHSDW6347BANCNFSM6AAAAAAWO2V5YA . You are receiving this because you are subscribed to this thread.Message ID: @.***>
Hi @ElouanBethuel,
I quickly followed up on this using your example data and in my case, it seems that for PCASamples
lowering the sd.threshold
to zero produces very similar results with the samples being mirrored along the y-axis, aka PC1 being reversed.
Considering the clustering, the default distance metric of clusterSamples
is "correlation", which is not supported by the HCPC
function. But if you switch to "euclidean" distance (default for HCPC
), the results match.
Best, Alex
Hello, First, thank you for your package, I have been using it for a few months and it's so useful. However, I encountered a problem with the function clusterSamples(). Indeed with my data, when I make a dendrogram is not consistent with the PCA and and I don't understand the reason.
I wrote a code in R to compare the function with the factoextra function and I don't get the same result. If you want to look this I leave you my code and the methylBase Object that I use. (fu.meth.zip)
Have a nice week-end, Elouan Bethuel
My methylBase object : fu.meth.zip