MarioniLab / scran

Clone of the Bioconductor repository for the scran package.
https://bioconductor.org/packages/devel/bioc/html/scran.html
39 stars 23 forks source link

Can I use BuildSNNGraph on scWGBS data? #82

Open xiaonian92 opened 3 years ago

xiaonian92 commented 3 years ago

Hello dear author,

I used this package for my scRNA-seq data analysis, now I wonder if I can use the function "BuildSNNGraph" on my DNA methylome data (data frame: genome_bin * cell_ID)? My purpose is to cluster cells based on their methylation level (numeric 0~1) using the same clustering algorithm of scRNA-seq. I want to use "BuildSNNGraph" and then "igraph::cluster_walktrap". Is this right?

I notice the description says: Build a shared or k-nearest-neighbors graph for cells based on their expression profiles; x: For the ANY method, a matrix-like object containing expression values for each gene (row) in each cell (column). These dimensions can be transposed if transposed=TRUE.

Looking forward to your reply, thanks!

LTLA commented 3 years ago

Sounds reasonable, though I understand that most analyses on methylation data are done on the M-values (i.e., the log-ratio of methylated to unmethylated counts) rather than the beta-values (which lie in [0, 1], as you have described). This has some friendlier mean-variance properties, see Figure 3 of the paper here. From the perspective of clustering, the use of M-values means that a change in methylation from 0.01 to 0.02 has a similar effect as a change from 0.1 to 0.2, whereas the use of beta-values would give the same weight to both 0.01-->0.02 and 0.5-->0.51... the latter is probably not what you want.

Regardless of what metric you decide to use, it's a good idea to (i) select highly variable features and (ii) use d to perform a PCA for you. This will speed things up and get rid of some noise at the same time. You may try using modelGeneVar() for the variance estimation, though I don't really know how it'll turn out; you might have to set parametric=FALSE (see ?fitTrendVar) to get a decent fit, as the default settings are optimized for count data.

xiaonian92 commented 3 years ago

Okay I see, thanks for your kindly reply, I'll try!