Scott-Devine / MELT-LRA

MELT-LRA: Mobile Element Insertion Site Classifier
Other
0 stars 0 forks source link

Improve TSD/polyT detection #11

Open jonathancrabtree opened 1 year ago

jonathancrabtree commented 1 year ago

In the following case a single base mismatch is preventing the TSD from being called, which in turn means that the polyT won't be found:

chr1:92701962   |ALU  |-  |100.4%| 97.5%| 86.0%| AAGATATTAC [TGTATGTTATTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTGAGAC....+237bp....AAAGTGCTGGGATTACAGGCGTGAGCCACCGCGCCCGGCCTACTG] TATGTTATTTTAACATGTTTAATCTTTTAA
                                                             ^                                                                                                         ^
                                                                                                     <ALU-              ------------------------------------ALU]

Two possible options to pursue here:

  1. Allow mismatches in TSDs
  2. In cases where there's a reverse strand ME match but no polyT, check immediately to the 5' of the ME alignment for a polyT.
jonathancrabtree commented 1 year ago

Another example from the chr22 callset:

chr22:33132512  |ALU  |+  |100.0%| 97.2%| 97.5%| 95.2%| ATGTTGAATT [AGAAGTCATTATTAGGGCCGGGCGCGGTGGCTCACGCCTGTAATC....+230bp....ACAGAGCGAAACTCCGTCTCAAAAAAAAAAAAAAAAAAAAAAAAA] AAAAGTCATTATTAGTATTTTTGTACTTAA
                                                                    ^                                                                              <---------polyA--------->  ^
                                                                                   [ALU--------------------------              ----------------ALU>