al-mcintyre / mCaller

A python program to call methylation (m6A in DNA) from nanopore signal data
MIT License
45 stars 16 forks source link

How to interpret diff.6 output? #31

Closed Shians closed 3 years ago

Shians commented 3 years ago

I see from the REAMDE that the columns are "chromosome, read name, genomic position, position k-mer context, features, strand, label, and probability of methylation"

Here is some output from some PCR E. Coli samples which should have no methylation.

U00096.3    4a2c2f99-da0d-4d53-96f3-1573d778d0ae    782 ACCGGMTCGAT 0.8,-0.195,-1.1199999999999999,1.28,-1.99,-0.9225,18.939263322884013    -   A   0.28
U00096.3    4a2c2f99-da0d-4d53-96f3-1573d778d0ae    881 ATGTGMTCAGC -0.36,1.07,-2.35,-3.3,-1.47,-2.5,18.939263322884013 -   A   0.25
U00096.3    4a2c2f99-da0d-4d53-96f3-1573d778d0ae    1168    AAGGGMTCTGG 0.25428571428571434,-0.63,0.26333333333333336,-1.115,-6.17,3.17,18.939263322884013  -   A   0.27
U00096.3    d1e7a6bc-92ed-422f-a7b8-b9ebeb39bac1    619 CGTCGMTCAGG 2.1666666666666665,2.33,-1.5050000000000001,-3.07,2.306666666666666,-3.6350000000000002,18.848024316109424  +   m6A 0.91
U00096.3    d1e7a6bc-92ed-422f-a7b8-b9ebeb39bac1    726 TGTCGMTCGCC 4.38,-1.5699999999999998,0.34,6.92,3.455,1.95,18.848024316109424    +   m6A 0.94
U00096.3    d1e7a6bc-92ed-422f-a7b8-b9ebeb39bac1    880 GGCTGMTCACA 2.13,-2.78,-0.97,2.0974999999999997,3.86,0.35,18.848024316109424    +   m6A 0.88
U00096.3    d1e7a6bc-92ed-422f-a7b8-b9ebeb39bac1    1167    TCCAGMTCCCT -1.935,-3.34,-1.86,-2.11,-0.54,0.04500000000000015,18.848024316109424   +   A   0.03
U00096.3    d1e7a6bc-92ed-422f-a7b8-b9ebeb39bac1    1569    GTGGGMTCTCG 2.145,1.86,-1.205,1.3249999999999997,-0.16666666666666666,-2.7,18.848024316109424   +   m6A 0.77
U00096.3    d1e7a6bc-92ed-422f-a7b8-b9ebeb39bac1    1634    CAGGGMTCTTC -1.805,-1.165,-0.06,3.34,-3.6100000000000003,2.1550000000000002,18.848024316109424  +   A   0.38
U00096.3    d1e7a6bc-92ed-422f-a7b8-b9ebeb39bac1    1723    TACCGMTCAGG 2.31,1.35,-0.78,6.26,2.85,-0.8480000000000001,18.848024316109424    +   m6A 0.98
U00096.3    d1e7a6bc-92ed-422f-a7b8-b9ebeb39bac1    2020    GGCGGMTCAAT 0.5333333333333333,-1.4,-0.99,-0.6549999999999999,-3.17,0.01,18.848024316109424 +   A   0.12
U00096.3    d1e7a6bc-92ed-422f-a7b8-b9ebeb39bac1    2353    AGATGMTCTTT 2.0,2.4,0.71,0.32,1.9949999999999999,-2.805,18.848024316109424  +   m6A 0.8
U00096.3    d1e7a6bc-92ed-422f-a7b8-b9ebeb39bac1    2632    TAATGMTCCGC 1.52,0.21,-2.655,2.34,1.27,-1.0416666666666667,18.848024316109424   +   m6A 0.86
U00096.3    d1e7a6bc-92ed-422f-a7b8-b9ebeb39bac1    2761    TGCTGMTCTGC -0.53,-2.285,-1.33,0.69,-2.12,-4.57,18.848024316109424  +   A   0.35
U00096.3    143b58d8-6651-4192-9592-1026c2a75cc3    619 CGTCGMTCAGG 0.6549999999999999,-1.52,-1.3175000000000001,-0.84,1.95,0.08,19.64565043894653  +   A   0.3
U00096.3    143b58d8-6651-4192-9592-1026c2a75cc3    726 TGTCGMTCGCC 4.96,-1.9975,-0.4666666666666666,2.4,-2.02,0.365,19.64565043894653  +   A   0.35
U00096.3    143b58d8-6651-4192-9592-1026c2a75cc3    781 TATCGMTCCGG 0.79,-3.1133333333333333,-2.47,1.2,0.66,-1.4800000000000002,19.64565043894653   +   m6A 0.63
U00096.3    143b58d8-6651-4192-9592-1026c2a75cc3    880 GGCTGMTCACA 2.4066666666666667,0.51,2.76,1.13,0.685,-2.15,19.64565043894653 +   m6A 0.57
U00096.3    143b58d8-6651-4192-9592-1026c2a75cc3    1167    TCCAGMTCCCT 2.16,6.13,1.23,-0.96,5.97,1.405,19.64565043894653   +   m6A 0.89
U00096.3    143b58d8-6651-4192-9592-1026c2a75cc3    1569    GTGGGMTCTCG -3.07,-1.074,-2.525,-0.895,3.19,-2.54,19.64565043894653 +   m6A 0.73
U00096.3    143b58d8-6651-4192-9592-1026c2a75cc3    1634    CAGGGMTCTTC -1.86,-2.5,-2.4000000000000004,-1.73,-11.36,-0.69,19.64565043894653 +   A   0.38
U00096.3    143b58d8-6651-4192-9592-1026c2a75cc3    1723    TACCGMTCAGG 0.47,-2.85,-1.46,3.79,3.255,0.44,19.64565043894653  +   m6A 0.93
U00096.3    143b58d8-6651-4192-9592-1026c2a75cc3    2020    GGCGGMTCAAT 0.5466666666666666,-0.04500000000000004,-2.0533333333333332,1.37,-0.82,0.57,19.64565043894653   +   A   0.2
U00096.3    143b58d8-6651-4192-9592-1026c2a75cc3    2353    AGATGMTCTTT 1.495,1.99,2.758,1.2366666666666666,0.4950000000000002,-0.79,19.64565043894653  +   A   0.32
U00096.3    143b58d8-6651-4192-9592-1026c2a75cc3    2515    CGACGMTCTCT 0.81,3.085,-0.07,4.31,-10.54,-2.85,19.64565043894653    +   m6A 0.96
U00096.3    143b58d8-6651-4192-9592-1026c2a75cc3    2632    TAATGMTCCGC 1.29,1.8,-2.55,0.39749999999999996,1.17,0.45999999999999996,19.64565043894653   +   A   0.43
U00096.3    143b58d8-6651-4192-9592-1026c2a75cc3    3073    TGCCGMTCGGT 2.37,1.096666666666667,-0.5199999999999999,-1.865,6.8149999999999995,0.5,19.64565043894653  +   m6A 0.94
U00096.3    143b58d8-6651-4192-9592-1026c2a75cc3    3262    TGATGMTCGAA 4.39,-2.05,6.45,1.72,0.88,-3.21,19.64565043894653   +   m6A 0.79
U00096.3    143b58d8-6651-4192-9592-1026c2a75cc3    3755    GAAAGMTCACA 0.8333333333333334,0.25,0.8700000000000001,2.57,3.44,-1.01,19.64565043894653    +   m6A 0.88
U00096.3    143b58d8-6651-4192-9592-1026c2a75cc3    3904    CGAAGMTCCTC -0.965,0.46,0.10666666666666662,2.275,-0.36,-2.42,19.64565043894653 +   A   0.45
U00096.3    17fdb5ab-3500-4825-9112-67dc62ea11e4    619 CGTCGMTCAGG 0.67,-0.3500000000000001,0.9,-0.51,2.58,-0.55,20.60535117056856 +   A   0.5
U00096.3    17fdb5ab-3500-4825-9112-67dc62ea11e4    726 TGTCGMTCGCC 2.6966666666666668,0.28,-0.695,1.12,3.16,1.75,20.60535117056856 +   A   0.47
U00096.3    17fdb5ab-3500-4825-9112-67dc62ea11e4    781 TATCGMTCCGG 0.4366666666666667,-0.83,0.64,-0.5700000000000001,0.05333333333333338,-2.19,20.60535117056856   +   A   0.27
U00096.3    17fdb5ab-3500-4825-9112-67dc62ea11e4    880 GGCTGMTCACA -0.17,0.895,-2.55,0.86,0.40500000000000014,-3.47,20.60535117056856  +   m6A 0.84
U00096.3    17fdb5ab-3500-4825-9112-67dc62ea11e4    1569    GTGGGMTCTCG -2.1125,-0.49,-1.05,10.19,2.01,-0.14,20.60535117056856  +   m6A 0.93
U00096.3    17fdb5ab-3500-4825-9112-67dc62ea11e4    1634    CAGGGMTCTTC -0.83,-1.415,-0.57,0.24,-2.025,-2.955,20.60535117056856 +   A   0.18
U00096.3    17fdb5ab-3500-4825-9112-67dc62ea11e4    1723    TACCGMTCAGG -0.14,-3.75,-2.19,-2.89,-4.63,-0.8799999999999999,20.60535117056856 +   A   0.19
U00096.3    17fdb5ab-3500-4825-9112-67dc62ea11e4    2020    GGCGGMTCAAT 1.09,-1.08,-0.72,-3.1466666666666665,-2.88,-4.333333333333333,20.60535117056856 +   A   0.33
U00096.3    17fdb5ab-3500-4825-9112-67dc62ea11e4    2353    AGATGMTCTTT 0.39999999999999997,-0.13,2.24,4.525,1.13,-0.3,20.60535117056856    +   m6A 0.59
U00096.3    17fdb5ab-3500-4825-9112-67dc62ea11e4    2632    TAATGMTCCGC 1.0075,0.11,-1.11,0.17833333333333334,0.8833333333333333,1.24,20.60535117056856 +   A   0.09
U00096.3    17fdb5ab-3500-4825-9112-67dc62ea11e4    2761    TGCTGMTCTGC -0.08,0.31,0.52,-1.46,-0.33799999999999997,1.95,20.60535117056856   +   A   0.02
U00096.3    17fdb5ab-3500-4825-9112-67dc62ea11e4    3073    TGCCGMTCGGT 2.3266666666666667,0.75,-0.19666666666666663,2.96,3.81,1.78,20.60535117056856   +   m6A 0.76
U00096.3    17fdb5ab-3500-4825-9112-67dc62ea11e4    3755    GAAAGMTCACA 0.04999999999999999,-1.4333333333333333,1.1199999999999999,1.11,0.985,-3.58,20.60535117056856   +   m6A 0.65
U00096.3    17fdb5ab-3500-4825-9112-67dc62ea11e4    3904    CGAAGMTCCTC 1.06,1.89,-0.48500000000000004,-0.02,1.54,1.86,20.60535117056856    +   A   0.13
U00096.3    17fdb5ab-3500-4825-9112-67dc62ea11e4    4820    GCGTGMTCAGT 3.035,-4.26,1.47,0.030000000000000027,-0.24,1.32,20.60535117056856  +   A   0.12
U00096.3    17fdb5ab-3500-4825-9112-67dc62ea11e4    4916    GTTGGMTCTGC -0.11000000000000001,0.41,-1.44,-0.62,0.39999999999999997,-1.89,20.60535117056856   +   A   0.27
U00096.3    17fdb5ab-3500-4825-9112-67dc62ea11e4    5345    AGGCGMTCGTG -0.71,1.35,1.53,0.29,0.8450000000000002,5.33,20.60535117056856  +   A   0.05
U00096.3    17fdb5ab-3500-4825-9112-67dc62ea11e4    5489    TCATGMTCATC 2.57,2.265,0.6399999999999999,-0.14500000000000002,2.09,-0.12,20.60535117056856 +   A   0.44
U00096.3    17fdb5ab-3500-4825-9112-67dc62ea11e4    5851    TGATGMTCTTA -1.8133333333333335,-1.7349999999999999,4.48,-5.14,-8.296666666666667,-4.5,20.60535117056856    +   m6A 0.62
U00096.3    17fdb5ab-3500-4825-9112-67dc62ea11e4    6061    GACGGMTCCCC -4.71,-0.94,-1.1366666666666667,-4.17,-0.21500000000000002,-1.85,20.60535117056856  +   A   0.09

On all 6mA contexts, is mCaller reporting a high probability of methylation?

al-mcintyre commented 3 years ago

This is per read, so I would expect the accuracy to be lower, although this really doesn't look good. Try aggregating over multiple reads or one of the newer models trained with data basecalled using guppy. Alternatively, if you have data from unamplified E. coli DNA, it may be worth training a new model that better represents your species-flowcell-basecaller combination.