adamewing / tldr

Identify and annotate TE-mediated insertions in long-read sequence data
MIT License
40 stars 4 forks source link

A problem of the output table file, the ALU family with the subfamily L1PA2 #15

Closed Amz965 closed 3 years ago

Amz965 commented 3 years ago

Hello @adamewing,

Thank you for this useful tool. There is a problem about contents of the output table file. I used the tool in my nanopore data, and get results, with columns in the same row that family is ALU and subfamily is L1PA2. There is the results

a2b04b00-a314-461e-9c69-6a6207cc1a60    chr13   112723285       112723296       -    ALU      L1PA2   NA      NA      0       False   NA      60.0    0.0     12      4    1NOMe_80U_60m_60m_filtered|12    NA      NA      NA      GGGGCTGGTTGGATGTTACACATTTCAGGGAGGCAGATTAGGAAAGACGTGTATCAATGAGTGGAAGGTGGCGTGGCTCAGCCGGAAAGGCAGGATGACTTGAAGTAAAGTAGGAATCAGTTCAGTTATGTTTTACTTTTATTCTTTTATTTTTTTTTCAGAGATTCTTTAATTTGCGGTTGATTAAAGGAGTAAGGTTCTATCTAAAACTTGGAGTCAACAGAAAGGAATGTTTAAGATAAGGAGGCCCTTGTCAGAGTCAGCCACAGGGTCAAAGAGCCCTGTTTAGCAGGTCTGACGCCCGCAGGCGGGACTCTTAACCCTCATCTGTAATGGCACTCGGGCCTGTTTATAATTTGGCATCTTAGTGCCACAGAGTCTGGTTTTTTTTCTTCGATCCTGTGAGCGTGGGTTGAGGTAAAAGAGGTTGTGGAGACCAGGGCTTTATCACGCAGATGATGTCTTATTGGTGTCTTATTGCCACAGAGTTCCCCACCTCCCCCCCCCCCCCACCTCCCCCACCCCCCACCCCCCACCTCCCCCACCTCCCCCCACCCCCACCTCCCCCCCACCCCCACCCACCTCCCCCCAACCCCCACCCACCTCCCCCCACCCCCCCCACCTCCCCCTCCCCCCCACCTCCCCACCCCCCCCCCCCCCCCCCCCACCCCCACCCCCACCCCCACCCCCCACCCCCCACCCCCCACCCCCCCACCTCCCACCACCCCCCCCCTACCCCCTCAACCCCCACCCCCACCCCCTACCCCCCACCCCCACCCCTCCCCCCACCCCCACCCACCTCCCCCCCCCCCCACCTCCCCCACCCCCCACCTCCCCCACCCCCCACCCCCACCCCCTACCCCCACCCCTACCCCCCCCCCCCACCTCCCCCCACCCCCACCCCCACCCCCCCACCCCCACCCCCACCTCCCCCACCCCCACCCACCTCCCCCACCCCCCCCCCCCCCCCCCCACCCCACCCCCTCCCCACACCCCCACCTCCCCCCCCCCACCCCCCCCCCACCCCCACCCCCCACCCCTACCCCCACCCCCACCCCCCCACCCCCACCTCCCCCCACCCCCACCCCCACCCCCCCCCACCCCCACCTCCCCCACCCCCACCCACCTCCCCCCCCCCCCACCTCCCCCCCCCCCCACCTCCCCCCCCCCCCACCCCCACCCCCTACCCCCACCCCCCACCTCCCCCACCCCCACCTCCCCCACCCCCCCCCCCACCCCTACCCCCCCACCCCACCCCACCCCTACCCCCACCCCCACCTCCCCCCACCCCCCACCCACCTCCCCCCCCACCCCCCACCCCCCCCACCCCCCACCTCCCCCCCCCCCCCACCCCCACCCCCACCCCCACCCCCACCCCCCACCCCCACCTCCCCCACCCCCCCCCCCACCCCCTACCCCCCACCCCCCACTCCCCCCACCCCCACCCACCTCCCCCCACCCCCACCTCCCCCACCCCCCACCTCCCCCCACCCCCACCCCTCCCCCACCCCCACCCCCTACCCCCACCCCCCACCCCCCCCACCCCCCACCCCTACCCCCCACCCCCCACCTCCTCCCCCCACCCCCACCTCCCCCACCCCCACCTCCCCCCGCCCTCTTGTCATTCAGGCTGTTGTGCAGTGGCGCGATCTCGGCTCACTGCACCCTCCGCCTGCCGGGTTCAGGCTATTCTCCTGCTTCAGCCTCTTGAGTAGCTGGGATTACAGGCACACACCACCACACCTGACTAACTTTTGTACTTTTAGTAGAGACGGGGTTTCACAATGTCGGCCAGGCTGGTCTCAAACTCCTAACCTCAGGTGATCCGCCTGCCTTGTCCTCCCCAAGTGCTGGGATTACAGGCGTGAGCCACCGTGCCTGGCCAGAGAGTGTGTATTCTGTTTGTTTTATGATCTCTGTATTGATGTTAATGCTGGTCAGTTGTGTCTAAACCCCAAAAGGGAGGGGGTAGAAGGAGGTATGTGTAACCTCTGTCATGGCTGGGAGCTCAGTTTTTAAGATTTTTCTGGGGTTCCCTTGACCAAGAAGGGATCCATTCAGTCGGAAGGGGCCTTGGGATTTTATTTTTAGTTTGCAAGTTGAATGTCT NA      False   LeftFlankSize,RightFlankSize,UnmapCoverNA,NoTEAlignment,NonRemappable,ShortIns
c845f443-634f-4347-b5de-446789f0bcae    chr16   86986870        86986872        -    ALU      L1PA2   NA      NA      0       False   NA      60.0    0.0     4       1    1NOMe_80U_60m_60m_filtered|4     NA      NA      NA      GCGGTGTGGAGACAGCTCCCTAGACCTCTTCTCTGTTATGCTCCTAATTTTCAGAAGATGTGTTCATTAAAAAACAAGAATACCCTGCTATGGTGTCTCCTCACACTGGATTATTTACATTCATTGAGGATAAGGGAACTGACTCCTAGCCTGGGAGCTTCCAGGAGGGAGAATCTGGGTGTTGACCGCTATATCCTGGGTCCTAGAACTATGCCTGGTACAGTAGCGGCTCGATAAATAATCGGATGACTGTGTGCGCTGACCAATGCCCACTTCTTTTAAAGGGTCAATATCAAATTTCCGGAATAAATCCCACTTTTTGTTTTTTAAAACGTGTAAGTGTGCTATTGGCTGTGTGTGTGCTCAGAGGAAAGGTCTGACTTCCTATGAGCATAGGCCACTCTGGGAATATTTCTTAAATATTGTTTTCTTTCCTATCGACACTAACACAATGTTGAACCACGAGAGTAGAGTACTCCCCAGCAGGCCACTCCCCAGATCAGGTGCCCTCTGAAAAGAAGGTGATGTTTGCTGTCAGTCAGGAGCCAGGTGGAGCTGCTCGCACGGAGGCTGAATCTGTGGGGGCTCTGTCTTCTGGAGATCCGTGGGCTTGAATTTTTGAACTTGGTCTTCTTTGAGAAAGATCTTGCTGTTCCCGGGTCAGGTACTGGAGACTGCACCCCTCTCCCCATGTGGTAGGTTGCTGCGACGAGCTCTGCCCCTTACCAGCCTCGAGCTGTCCCCAGAGAAGCTGTGTCCCGAAGCCTGGGCAATACCTCCGGCTGGGTGCTGGATGCAGCCTGGATTCCGGTGTCTGGGTGGGGCCTTCCTCACTGGTACCCTACTCTCGCGGCCTCTCGCTTTATCCTCTTCCTTACACAATAGGCTCTGGGCAGTCCTATTTACTGAGCGCCCACTGTGTGCCAGGCACCTGTTGACAGGACTCAGCTCCCTAGAATTTGCCTAACAGGTCTATTATGAGGCTACCATTCTTACACCCAATTTACAGATGGGAAAACAGAGAAACCCCATGAGGAAGGGGCATCCTTTGCCACAGCACCCTGTGCCTTGAGCTTCCAGTACCCTCATCTGAAAAACAAGAATGAAACTGCCCCGCCTCTCTCCTGGCCAGCTGTGATCAGGGTCTGGTGTCACTGCTTGGACCCTTAGGGCCTTGACAAGGTCAAGCTAGGTAGTTCATAATGACCATCAAGTCATCGAGTGAGATGCTGTATTTCACAATGTTTTCTTTTTCTTTAAGATGGAGTCTTGGCTCTGTCAGCCCACAGGAGCTGAGTGCGGTGGTGCAATCTCAGTTTCACTGCAACCTCCACATCCCAGGTTCAAGCGATTCTCCTGCCTCAGCCTCCTGAGTAGGTCCTTCCCAGCCTCAGCATACCAACATGCCTGGGTAAATATTTTTGTATTTTTAGTAGAGACAGGGTTTCATCATGTTGGCCAAGGCGGATCTCAAACTCTTGACCTCAAGTGATCTGCTCATCTCGGCCTCCCAAAGTTCTAGGATTACAGGTGTGAGCCACTGCGCCTGGCCTCACCGTGCTTTCTGAGTGCCAGATGTTCTAGGGCCACACTTTCACAGTAATTATTCCACACTCGCTGCTCCAAGACCTTCATTAACCCATTCAGTATTCACCGCAACCCCATGACAGGGGCATGCTCTCATCCTCACTGGACAGCAAAGAACACTGAGGGAGAGAGGGATAAGTAACTCACTCAAGGTCACGTAGTAAAGGACATGGGAATTTGCATAGCAAGTCTATGAAGCAGGTATTATTTTCTCGCCACTTTACAGATGGGAGAAAACTGAAGAGTCCAGGAGGGAACACTCTCTCCCGCATTACGGGCTGCCTGGCTTGTAATAGCAGGCGCTCATCTCCACTGCTAGAGAAACGTGTAAATGTGTGTGTGTTGTGGGGGGGGGGGAGGGGGAGGAGGAGGGAGGGGGAGGAGGAGGGGAGGGGAGGAGGAGGGAGGGGGAGGGGAAGGAGCAGGAGGGAGGAGCAGGGGGAGGGAGGGAGGGGCAGGAGGAGGAGCAGGGGAGGGGGGAGGGGGCGGAGAGAGAGGACTGGGGAGGGGAGGGGAGGGGAGGGAGGGGCGGGAGGGAGGGGAGGGGGAGGGGAGGGGGAGGGGAGGGGGCGGAGGGGGAGGGGGAGGGGGAGGGGAGGGGGGAGGGGGAGGGGGGGAGGGGGAGGGGGAGGGGGAGGGGGAGGGGGAGGGGGGGAGGGGGAGGGGAGGGAGGGGAGGAGGGAGGGGGAGGGGAGGGGAGGGGGAGGGGAGGGAGGGGAGGGAGGGGGAGGGAGGGGAGGGAGGGGAGGGAGGAGGGGAGGGAGGAGGGAGGGAGGGGAGGGGGGAGGGGGAGGGGGAGGGGAGGGAGGGGAGGAGGGGGGGAAGGGGGAGGGGGAGGAGGGAGGGGGAGGGGAGGGAGGGAGGGGGAGGGGAGGGGAGGGGAGGGGGAGGGGGAGGGGGAGGGGGGAGGGAGGGGGAGGGGGAGGAGGGGGAGGGGGAGGGGAGGGGGAGGGGGAGGGGGAGGGAGGGGGAGCAGGACCCTGCTGTTCAAGTCGCATCTTTAGAGGCTGCGTGAAGTCCCAGTTCCTGAGGCACACAGCCCCTCCTCTGAACGCTACGGATGTCATTACCTGTGAAGTTCCCTGCACTAAATCCCTGCCCTGTCTCCTGTGTGTGCCGTGTCCACATGCTTGAAACCCAGCTCCCAGAGAAGCCCCCAGGAAGGGCAGGCCCAGCGCTTTTCTCCTCATATTTGGGTTTCCTGCCACCCCGACCTTGCGCATTGTGTCGACGTGACACCCCCAAGCATCTCCATTTCAGGGTCAGACGACAGAAACGCGCCCCCTCTGGTACCCCGGGGGCATATGCAGGCTGAGGGCATACGCAGGCTGAGGGGCA       NA      False   LeftFlankSize,RightFlankSize,UnmapCoverNA,NoTEAlignment,NonRemappable,ShortIns

L1PA2 belong to the L1 family, right? Or you means the inserted TE here is a chimera of ALU and L1PA2? Similar lines also happened, and here is the frequency table. image I hope to get your explanation about this. Thank you and looking forward to your reply~

adamewing commented 3 years ago

These two examples are both filtered i.e. they don't have "PASS" in the "Filter" column. I would recommend only considering insertions with that pass filters, as those that don't can be bogus for a variety of reasons. Do you have examples of family/subfamily disagreement for insertions passing filters?

Amz965 commented 3 years ago

Thank you for the quick reply. After filtering there are no more these lines, and I will use these lines according to your advice. Thank you again!

Amz965 commented 3 years ago

Thank you for the quick reply. After filtering there are no more these lines, and I will use these lines according to your advice. Thank you again!