al2na / methylKit

R package for DNA methylation analysis
https://bioconductor.org/packages/release/bioc/html/methylKit.html
210 stars 96 forks source link

question with the clusterSamples() function #279

Closed ElouanBethuel closed 1 year ago

ElouanBethuel commented 1 year ago

Hello, First, thank you for your package, I have been using it for a few months and it's so useful. However, I encountered a problem with the function clusterSamples(). Indeed with my data, when I make a dendrogram is not consistent with the PCA and and I don't understand the reason.

I wrote a code in R to compare the function with the factoextra function and I don't get the same result. If you want to look this I leave you my code and the methylBase Object that I use. (fu.meth.zip)

Have a nice week-end, Elouan Bethuel

#####################################################################

# load packages : 
library(methylKit)
library(FactoMineR)
library(factoextra)

# load fu.meth object : 
fu.meth <- readRDS( file = "fu.meth")

# test PCA and clustering with methylkit functions  : 

par(mfrow = c(1,2))
pca_m <- PCASamples(fu.meth, screeplot=FALSE, obj.return = TRUE)
hcu_m <- clusterSamples(fu.meth, dist="correlation", method="ward", plot=TRUE)

# test PCA and clustering with FactoMineR functions : 

fu.meth_perc <- percMethylation(fu.meth)
fu.meth_perc_t <- data.frame(t(fu.meth_perc))

pca <- PCA(fu.meth_perc_t, ncp = 5, graph = FALSE, scale.unit = TRUE) 
hcu <- HCPC(pca, method="ward", iter.max= 3,  nb.clust = -1 , graph = FALSE)

par(mfrow = c(1,2))
fviz_pca_ind(pca) 
fviz_dend(hcu) 

##########################################################################

My methylBase object : fu.meth.zip

alexg9010 commented 1 year ago

Hi Elouan,

The difference in the clustering outcome occurs since we apply an additional low variance filter before performing the clustering. This is essentially controlled by the 'sd.filter' argument which is 0.5 by default, essentially removing all bases with less than 50% deviation from the standard deviation. You may check the code here (, https://github.com/al2na/methylKit/blob/master/R/clusterSamples.R#L286-L315 ).

I hope this answers your question.

Best, Alex

ElouanBethuel @.***> schrieb am Fr., 31. März 2023, 17:55:

Hello, First, thank you for your package, I have been using it for a few months and it's so useful. However, I encountered a problem with the function clusterSamples(). Indeed with my data, when I make a dendrogram is not consistent with the PCA and and I don't understand the reason.

I wrote a code in R to compare the function with the factoextra function and I don't get the same result. If you want to look this I leave you my code and the methylBase Object that I use. (fu.meth.zip)

Have a nice week-end, Elouan Bethuel

#####################################################################

load packages :

library(methylKit) library(FactoMineR) library(factoextra)

load fu.meth object : fu.meth <- readRDS( file = "fu.meth")

test PCA and clustering with methylkit functions :

par(mfrow = c(1,2))pca_m <- PCASamples(fu.meth, screeplot=FALSE, obj.return = TRUE)hcu_m <- clusterSamples(fu.meth, dist="correlation", method="ward", plot=TRUE)

test PCA and clustering with FactoMineR functions :

fu.meth_perc <- percMethylation(fu.meth)fu.meth_perc_t <- data.frame(t(fu.meth_perc)) pca <- PCA(fu.meth_perc_t, ncp = 5, graph = FALSE, scale.unit = TRUE) hcu <- HCPC(pca, method="ward", iter.max= 3, nb.clust = -1 , graph = FALSE)

par(mfrow = c(1,2)) fviz_pca_ind(pca) fviz_dend(hcu) ##########################################################################

My methylBase object : fu.meth.zip https://github.com/al2na/methylKit/files/11123665/fu.meth.zip

— Reply to this email directly, view it on GitHub https://github.com/al2na/methylKit/issues/279, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADK7JD4BXVSYCJM2HTTHHSDW6347BANCNFSM6AAAAAAWO2V5YA . You are receiving this because you are subscribed to this thread.Message ID: @.***>

alexg9010 commented 1 year ago

Hi @ElouanBethuel,

I quickly followed up on this using your example data and in my case, it seems that for PCASamples lowering the sd.threshold to zero produces very similar results with the samples being mirrored along the y-axis, aka PC1 being reversed.

Considering the clustering, the default distance metric of clusterSamples is "correlation", which is not supported by the HCPC function. But if you switch to "euclidean" distance (default for HCPC), the results match.

Best, Alex