PacificBiosciences / kineticsTools

Tools for detecting DNA modifications from single molecule, real-time sequencing data
19 stars 21 forks source link

Determine significant coverage and score value in the ipdSummary gff #65

Open uceleste opened 5 years ago

uceleste commented 5 years ago

Hi All,

I would like to know what is the coverage and score value in the ipdSummary gff to consider a modified base as confident.

Example:

seqname source feature start end score strand frame coverage context IPDRatio CognateBase
genome kinModCall modified_base 64078 64078 42 - . 764 TTCGCAAGAAGACCTGAAGACCCTAGTGAAGTTTCTTCTTC 1.53 C
genome kinModCall modified_base 63115 63115 21 - . 759 TATAGTGAAATGAGAGGGAGTTACGAGGAGCAATGTAATGC 1.41 T
genome kinModCall modified_base 63168 63168 28 - . 759 AGCCATGCTTCGTTTGTGGAGGGGTGAAACATTTAGCTAAG 1.46 G
genome kinModCall modified_base 63203 63203 62 - . 757 AGGAATCCACATGGTCACAAGGGCAGAGTCACAAGAGCCAT 1.74 G
genome kinModCall modified_base 61924 61924 73 - . 756 TTCGGGAACATGATCTTGGAGGTAAATGTTTTCCACATTGC 1.87 G

Thanks

rhallPB commented 5 years ago

Score is dependent on coverage, and it isn't always possible to define a confidant cutoff. I would plot the data that you have (coverage vs score), the modified bases should form a distinct cluster. From the plot you should be able to define a function to distinguish modified from unmodified bases. Obviously this is much easier if you have some kind of control, known modified motif etc.