andreaskapou / scMET

Bayesian modelling of DNA methylation heterogeneity at single-cell resolution
20 stars 1 forks source link

How can I apply scMET on real data? And does it do imputation? #3

Open xiaonian92 opened 3 years ago

xiaonian92 commented 3 years ago

Hello dear author, it's exciting to find this new tool! I've been following Mellisa but gave up due to the lack of detailed annotation of genome features, could you please give a brief suggestion what's the difference of scMET and Melissa, and how to determine which tool is more suitable for my data? Also, when I check "Online vignette", I found "scMET on real data: TODO", can I perform the same analysis according to the synthetic data tutorial? Thanks!

andreaskapou commented 3 years ago

Hi,

It's a shame you couldn't use Melissa for your analysis. The main difference is that scMET is not neither a clustering nor an imputation tool.

Actually scMET can be used to identify genomic features that are potentially useful in determining the different sub-populations in your dataset (those features that are highly variable in the data). Then you can perform downstream analysis, such as dimensionality reduction or clustering using only those selected (and more informative) features. This is really similar to analysis in scRNA-seq, where we select features based on variability, followed by dimensionality reduction and clustering. E.g. see https://satijalab.org/seurat/articles/pbmc3k_tutorial.html.

Also, you can use scMET to identify differentially methylated or differentially variable features across two pre-determined sub-populations. In this sense, it is quite distinct from Melissa in terms of functionality.

Yes, you can perform the same analysis on real data, you only need to bring the data in the required format as in the example synthetic data. To help with functions for running on real data and processing you could have a look at https://github.com/andreaskapou/scMET-analysis.

This function https://github.com/andreaskapou/scMET-analysis/blob/master/ecker2017/utils/annotate.R is used to go from bismark files to genomic features.

Then, you can follow the gastrulation example code here https://github.com/andreaskapou/scMET-analysis/blob/master/gastrulation/00_run/fit_scmet.R.

Again though in most cases you need to define an annotation object, for the potentially interesting genomic features that you might be interested. You could also try to bin the genome and then use sliding windows, however, this approach will be time consuming and then you need to make biological sense of the genomic sliding windows.

Apologies for not having yet an easy to use pipeline. We are currently working on providing examples with real data and make this step of running scMET much more straighforward.

Best, Andreas

xiaonian92 commented 2 years ago

Hi Andreas,

Thank you so much for the detailed reply, and sorry for replying late. I've been working on the basic data analysis and finally at this step. After discussed with my lab mates, we still had several questions: 1) We found that in addition to sparse coverage, the heterogeneity of coverage rate/sites is also a serious problem, and we found the mean methylated level of a feature is positively correlated to the coverage rate (detacted CGs/all CGs) of the feature. Is this true in your data? Is it necessary to revise data also based on this bias? 2) There're 2 group (A & B) in my samples, should I run "scmet, scmet_hvf" and get the HVFs respectively (scmet_dt_A, scmet_dt_B) then do comparision? If so, what HVFs should I use for basic downstream analysis (demension reduction, cluster, ...) for all samples? The scmet_dt_AB (run at one time), or scmet_dt_A and scmet_dt_B?

Thanks! Best, Leanne Chen