Open BaylorLyu opened 2 years ago
Having a quick look through probability.txt, there are only a few with prevalence = 1. These regions have average methylation probability > 0.5, so the likely situation is that there are only a few reads/sites in this region and they happen to be marked methylated. You can try to use query_methy()
to look up the underlying data for specific regions.
Also you can try to plot the distribution of the probabilities, if it's the same as what I've seen before, you'll get a big bump near 0.5 with many messy spikes rather than a bi-modal distribution with peak near 0 and 1. This can be done by using
x <- read.table(gzfile("path_to_methy.tsv.bgz"), col.names = methy_col_names(), nrows = 50000)
plot(density(e1071::sigmoid(x$statistic)))
You can compare this to
nmr <- load_example_nanomethresult()
x <- read.table(gzfile(methy(nmr)), col.names = methy_col_names())
plot(density(e1071::sigmoid(x$statistic)))
to see how clean 5mC calling is (called with Nanopolish). I think the 6mA models of Megalodon is just not that accurate. As far as I know, the 5mC models are specifically trained with 5mC datasets, while 6mA calling essentially calls all modifications and is understandably less sensitive compared to a specialised model.
https://1drv.ms/u/s!AqDiQPLr--syUlGGondDOjo3IY?e=VccPuS This link is my test data and probability produce by the code.