andreaskapou / scMET

Bayesian modelling of DNA methylation heterogeneity at single-cell resolution
20 stars 1 forks source link

Apply scMET on sliding windows #5

Open nshen7 opened 1 year ago

nshen7 commented 1 year ago

Hello Dear Andreas,

I was trying to apply scMET on a large-scale scBS-seq dataset using non-overlapping sliding windows of 20kb. I noticed that you has suggested in the scMET paper:

In the spirit of divide-and-conquer schemes, we bypass this problem via a parallelization strategy in which we apply scMET separately to each chromosome. Feature-specific estimates obtained for each chromosome can be combined post hoc when performing HVF selection and differential analyses.

Do you have any instructions on how to combine the estimates post hoc? Or any functions developed for that purpose? Thanks in advance! Looking forward to hearing you back.

Best, Ning

andreaskapou commented 1 year ago

Dear Ning,

Regarding your question, here is the code used in the paper (for the Ecker2017 dataset) for:

  1. Running scMET on each chromosome: https://github.com/andreaskapou/scMET-analysis/blob/master/ecker2017/all_cells/00_run/fit_scmet_window.R
  2. Combining the estimates of each chromosome and performing HVF analysis: https://github.com/andreaskapou/scMET-analysis/blob/master/ecker2017/all_cells/01_hvf/hvf_window.Rmd

Hope this helps! Please let me know if you have any other questions.

best, Andreas

nshen7 commented 1 year ago

Hello Andreas,

Thanks for the reply! I really appreciate the help.

I just wanna make sure that I understood your code correctly - I don't have to combine the results from scmet function and then apply the scmet_hvf, right? All you did was putting together HVFs from each chromosome and consider them as the set of HVFs from the entire genome?

Best, Ning

andreaskapou commented 1 year ago

Dear Ning,

Yes, the way we did the analysis (line https://github.com/andreaskapou/scMET-analysis/blob/cd8700dc15e6eff590eafe0864ba17b94cb4ad23/ecker2017/all_cells/01_hvf/hvf_window.Rmd#L100) is to combine the output HVFs for each chromosome and then sort by (residual) overdispersion to extract the top N HVFs.

One thing to note though with this approach, is to check tha the mean-overdispersion relationship is similar across chromosomes, otherwise your results might be biased towards certain chromosomes.

Best, Andreas