al-mcintyre / mCaller

A python program to call methylation (m6A in DNA) from nanopore signal data
MIT License
45 stars 16 forks source link

how to interpret the output of mCaller.py and make_bed.py ? #39

Open dashengzhao opened 6 months ago

dashengzhao commented 6 months ago

Hello, Dr McIntyre, i have noticed that the output of mCaller.py is like this: ... R93_chr01 0000a989-a60a-4bbf-9dea-d7338c442966 11722827 GMTCMMCCTMT 3.4875,-2.31,-0.5075,-0.67,-1.16,-0.07,19.519852053140095 + A 0.02 R93_chr01 0000a989-a60a-4bbf-9dea-d7338c442966 11722831 MMCCTMTMTCM -1.16,-0.07,-1.68,2.46,8.355,1.555,19.519852053140095 + m6A 0.79 R93_chr01 0000a989-a60a-4bbf-9dea-d7338c442966 11722833 CCTMTMTCMCC -1.68,2.46,8.355,1.555,-6.78,1.2333333333333334,19.519852053140095 + A 0.18 R93_chr01 0000a989-a60a-4bbf-9dea-d7338c442966 11722836 MTMTCMCCCCM 1.555,-6.78,1.2333333333333334,0.57,0.6699999999999999,1.05,19.519852053140095 + A 0.08 R93_chr01 0000a989-a60a-4bbf-9dea-d7338c442966 11722841 MCCCCMMTTMM 1.05,0.5525,0.79,-0.16500000000000004,0.8,-4.52,19.519852053140095 + A 0.4 R93_chr01 0000a989-a60a-4bbf-9dea-d7338c442966 11722842 CCCCMMTTMMC 0.5525,0.79,-0.16500000000000004,0.8,-4.52,0.81,19.519852053140095 + A 0.02 ... So, the first question is : what's the meaning of last column ( such as 0.02, 0.79, 0.18) ? Does it represent "probability of methylation score" ?

The corresponding output of make_bed.py is like this: ... R93_chr01 11722823 11722824 TGCMGMTCMMC 0.2631578947368421 + 19 R93_chr01 11722826 11722827 MGMTCMMCCTM 0.11764705882352941 + 17 R93_chr01 11722831 11722832 MMCCTMTMTCM 0.17391304347826086 + 23 R93_chr01 11722833 11722834 CCTMTMTCMCC 0.16 + 25 R93_chr01 11722865 11722866 GCTMCMMGTGG 0.10714285714285714 - 28 ...

So, the 2nd question is: what's meaning of 5th column (such as 0.2631578947368421, 0.11764705882352941) ? i have noticed that in README file, this 5th column may represents "% methylated" ? if true, what's the difference between "probability of methylation score" and "% methylated" ?

Thank you.

al-mcintyre commented 6 months ago

The first output is probability of methylation per read (e.g. read "0000a989-a60a-4bbf-9dea-d7338c442966" has a probability of being methylated of 0.79 based on the model used). The second output is fraction of reads detected as methylated per site (e.g. site "R93_chr01 11722831" = 0.1739 of 23 reads - so 4 reads detected as methylated above a default threshold of 0.5).

dashengzhao commented 6 months ago

Thank you !

PRIYANKA-22091995 commented 6 months ago

Hello @al-mcintyre I have a simple doubt in bed file generated(screen shot attached), in the 4th column of the bed file generated has bases written, what excatly does it mean, because i can visualize that the 6th base is M, does it indicate that it has methylation in the A? Also please provide the headers of the bed file.

Thanks Screenshot (118)

al-mcintyre commented 6 months ago

These are the columns in the bed file: chrom, chromStart, chromEnd, context, % methylated, strand, depth of coverage. The M in the sequence motif simply highlights bases for which methylation is being predicted (in your case, the A in GATC motifs) but the following column gives the percent of reads predicted as methylated for that site.