Scott-Devine / MELT-LRA

MELT-LRA: Mobile Element Insertion Site Classifier
Other
0 stars 0 forks source link

Detect/filter SVA calls within existing SVAs #16

Closed jonathancrabtree closed 1 year ago

jonathancrabtree commented 1 year ago

i.e., identify cases where there's already an SVA and we're detecting an extension of the existing SVA repeat: "fix the problem with SVA calls where internal VNTR expansions are being called as new mobile element insertions (perhaps a simple filter would fix this). We really just want new SVA insertions that are flanked by TSDs and have a poly(A) tail."

jonathancrabtree commented 1 year ago

Two options for obtaining SVA calls (on the reference sequence):

  1. Download reference ME annotation from somewhere for whatever ref genome was used
  2. Extract region around each SVA call and perform ME identification in that sequence
jonathancrabtree commented 1 year ago

Note that we can't use the presence of a TSD call (alone) because the TSD called by the pipeline may simply be a GC-rich VNTR in the insertion matching with one in the SVA outside the insertion, like in this case:

Screen Shot 2023-08-01 at 9 20 21 PM

jonathancrabtree commented 1 year ago

The above insertion is in an SVA_F element according to the UCSC RepeatMasker track:

Screen Shot 2023-08-29 at 12 02 14 PM

Using ncls to find overlapping RepeatMasker annotations:

./test-NCLS.sh
INFO - read 5683690 repeat rows from rmsk.txt.gz
(33051343, 33052566, 75727) ['837', '2789', '113', '37', '90', 'chr1', '33051343', '33052566', '-215903856', '+', 'SVA_F', 'Retroposon', 'SVA', '382', '865', '-518', '6']
(33052313, 33053486, 75728) ['837', '7077', '61', '5', '47', 'chr1', '33052313', '33053486', '-215902936', '+', 'SVA_F', 'Retroposon', 'SVA', '421', '1375', '0', '6']