Shians / NanoMethViz

https://shians.github.io/NanoMethViz/
Apache License 2.0
23 stars 1 forks source link

I found100% prevalence for 6ma data #13

Open BaylorLyu opened 2 years ago

BaylorLyu commented 2 years ago

https://1drv.ms/u/s!AqDiQPLr--syUlGGondDOjo3IY?e=VccPuS This link is my test data and probability produce by the code.

Shians commented 2 years ago

Having a quick look through probability.txt, there are only a few with prevalence = 1. These regions have average methylation probability > 0.5, so the likely situation is that there are only a few reads/sites in this region and they happen to be marked methylated. You can try to use query_methy() to look up the underlying data for specific regions.

Also you can try to plot the distribution of the probabilities, if it's the same as what I've seen before, you'll get a big bump near 0.5 with many messy spikes rather than a bi-modal distribution with peak near 0 and 1. This can be done by using

x <- read.table(gzfile("path_to_methy.tsv.bgz"), col.names = methy_col_names(), nrows = 50000)

plot(density(e1071::sigmoid(x$statistic)))

You can compare this to

nmr <- load_example_nanomethresult()
x <- read.table(gzfile(methy(nmr)), col.names = methy_col_names())

plot(density(e1071::sigmoid(x$statistic)))

to see how clean 5mC calling is (called with Nanopolish). I think the 6mA models of Megalodon is just not that accurate. As far as I know, the 5mC models are specifically trained with 5mC datasets, while 6mA calling essentially calls all modifications and is understandably less sensitive compared to a specialised model.