Scott-Devine / MELT-LRA

MELT-LRA: Mobile Element Insertion Site Classifier
Other
0 stars 0 forks source link

Improve accuracy of coverage calculation. #9

Closed jonathancrabtree closed 1 year ago

jonathancrabtree commented 1 year ago

The TSD and polyA/polyT, if identified, should be subtracted (from both the insertion and the ME alignment) before computing the ratio remaining_ME_alignment_bp / remaining_insertion_bp. In the following example, the coverage should be 100% because every base in the insertion not covered by the polyT is covered by the ME alignment:

chr22:49879746  |ALU  |-  | 76.2%| 93.2%| 96.3%|100.0%| GTTGGTTTTT [TTTTTTTTTTTTTGAGACGGAGTCTCGCTCTGTTGCCCAGGCCGG....+144bp....GACGGGGTTTCACCTTGTTAGCCAGGATGGTCTCGATCTCCTGAC] CTCCTGGTCAGTGTCTGGGCGTGTTTCTAT                
                                                                    <---polyT--->                                                                                                                                           
                                                                                 <ALU----------------------------              -----------------------------------------ALU]                                                

Likewise in this example the coverage is 100% (not more), because the TSD is subtracted from both the insertion length and also the ALU alignment length before taking the ratio:

chr22:45166718  |ALU  |+  | 32.4%| 84.5%| 95.6%|100.0%| TCTGCACAGT [AAAGAATTATGTCGCGTGAACCCGGGAAGCGGAGCTTGCAGTGAG....+35bp.....GGGCGACAGAGCGAGACTCCGTCTCAAAAAAAAAAAAAAAAAAAA] AAAGAATTATGTCTATTCCTGTATTTGTTT
                                                                    ^^^^^TSD^^^^^                                                                       <------polyA------->  ^^^^^TSD^^^^^
                                                                      [ALU---------------------------------------              ---------------------ALU>