genome / pindel

Pindel can detect breakpoints of large deletions, medium sized insertions, inversions, tandem duplications and other structural variants at single-based resolution from next-gen sequence data. It uses a pattern growth approach to identify the breakpoints of these variants from paired-end short reads.
GNU General Public License v3.0
162 stars 90 forks source link

Incorrect event coordinates near homopolymer runs #78

Open jmarshall opened 7 years ago

jmarshall commented 7 years ago

We have some data on which Pindel calls a deletion of an A with coordinates as follows:

8514    D 1     NT 0 "" ChrID chr6      BP 93257525     93257527        BP_range 93257525       93257530        Supports 38     38      + 33    33      - 5     5       S1 204  SUM_MS 2280     [snip]
TTTGGTACTTACCCTGAGAAATGCATCTAGGGCTCCATTTTCCATGAACTCTATTACTATCATGACTGGTTTCCCTaAAATTAAAAAAAAAAAGTTTCAGGAGTCGTAACCTGAGCTGGAGGGTCATTTCCTTTCTCATCCCCCAATAAAAC
                                                                   GGTTTCCCT AAAATTAAAAAAAAAAGTTTCAGGAGTCGTAACCTGAGCTGGAGGGTCATTTCCTTTCTCATCCCCC                +       93257516        60      [snip]  
                                                                   GGTTTCCCT AAAATTAAAAAAAAAAGTTTCAGGAGTCGTAACCTGAGCTGGAGGGTCATTTCCTTTCTCATCCCCC                +       93257417        60      [snip]  
                                                                   GGTTTCCCT AAAATTAAAAAAAAAAGTTTCAGGAGTCGTAACCCGAGCTGGAGGGTCATTTCCTTTCTCATCCCC                 +       93257416        60      [snip]  
                                                               GACTGGTTTCCCT AAAATTAAAAAAAAAAGTTTCAGGAGTCGTAACCTGAGCTGGAGGGTCATTTCCTTTCTCAT                     +       93257467        60      [snip]  
                                                               GACTGGTTTCCCT AAAATTAAAAAAAAAAGTTTCAGGAGTCGTAACCTGAGCTGGAGGGTCATTTCCTTTCTCATC                    +       93257469        60      [snip]  
                                                            CATGACTGGTTTCCCT AAAATTAAAAAAAAAAGTTTCAGGAGTCGTAACCTGAGCTGGAGGGTCATTTCCTTTCT                        +       93257448        60      [snip]  
                                                          ATCATGACTGGTTTCCCT AAAATTAAAAAAAAAAGTTTCAGGAGTCGTAACCTGAGCTGGAGGGTCATTTCCTTTC                         -       93257750        60      [snip]  
                                                          ATCATGACTGGTTTCCCT AAAATTAAAAAAAAAAGTTTCAGGAGTCGTAACCTGAGCTGGAGGGTCATTTCCTTT                          +       93257471        60      [snip]  
                                                          ATCATGACTGGTTTCCCT AAAATTAAAAAAAAAAGTTTCAGGAGTCGTAACCTGAGCTGGAGGGTCATTTCCTTTC                         +       93257402        60      [snip]  
                                                      TACTATCATGACTGGTTTCCCT AAAATTAAAAAAAAAAGTTTCAGGAGTCGTAACCTGAGCTGGAGGGTCATTTC                              +       93257460        60      [snip]  
                                                      TACTATCATGACTGGTTTCCCT AAAATTAAAAAAAAAAGTTTCAGGAGTCGTAACCTGAGCTGGAGGGTCATTTC                              +       93257422        60      [snip]  
                                                    ATTACTATCATGACTGGTTTCCCT AAAATTAAAAAAAAAAGTTTCAGGAGTCGTAACCTGAGCTGGAGGGTCATT                                +       93257376        60      [snip]  
                                                    ATTACTATCATGACTGGTTTCCCT AAAATTAAAAAAAAAAGTTTCAGGAGTCGTAACCTGAGCTGGAGGGTCAT                                 +       93257464        60      [snip]  
                                                    ATTACTATCATGACTGGTTTCCCT AAAATTAAAAAAAAAAGTTTCAGGAGTCGTAACCTGAGCTGGAGGGTCAT                                 +       93257462        60      [snip]  
                                                    ATTACTATCATGACTGGTTTCCCT AAAATTAAAAAAAAAAGTTTCAGGAGTCGTAACCTGAGCTGGAGGGTCATTT                               +       93257427        60      [snip]  
                                              AACTCTATTACTATCATGACTGGTTTCCCT AAAATTAAAAAAAAAAGTTTCAGGAGTCGTAACCTGAGCTGGAGGG                                     +       93257478        60      [snip]  
                                           ATGAACTCTATTACTATCATGACTGGTTTCCCT AAAATTAAAAAAAAAAGTTTCAGGAGTCGTAACCTGAGCTGGA                                        +       93257461        60      [snip]  
                                           ATGAACTCTATTACTATCATGACTGGTTTCCCT AAAATTAAAAAAAAAAGTTTCAGGAGTCGTAACCTGAGCTGGA                                        +       93257451        60      [snip]  
                                      TTTCCATGAACTCTATTACTATCATGACTGGTTTCCCT AAAATTAAAAAAAAAAGTTTCAGGAGTCGTAACCTGA                                              +       93257435        60      [snip]  
                                   CATTTTCCATGAACTCTATTACTATCATGACTGGTTTCCCT AAAATTAAAAAAAAAAGTTTCAGGAGTCGTAACCT                                                +       93257414        60      [snip]  
                                   CATTTTCCATGAACTCTATTACTATCATGACTGGTTTCCCT AAAATTAAAAAAAAAAGTTTCAGGAGTCGTAACCT                                                +       93257389        60      [snip]  
                                  CCATTTTCCATGAACTCTATTACTATCATGACTGGTTTCCCT AAAATTAAAAAAAAAAGTTTCAGGAGTCGTAAC                                                  +       93257331        60      [snip]  
                               GCTCCATTTTCTATGAACTCTATTACTATCATGACTGGTTTCCCT AAAATTAAAAAAAAAAGTTTCAGGAGTCGT                                                     -       93257658        60      [snip]  
                               GCTCCATTTTCCATGAACTCTATTACTATCATGACTGGTTTCCCT AAAATTAAAAAAAAAAGTTTCAGGAGTCGTA                                                    +       93257329        60      [snip]  
                              GGCTCCATTTTCCATGAACTCTATTACTATCATGACTGGTTTCCCT AAAATTAAAAAAAAAAGTTTCAGGAGTCGT                                                     -       93257659        60      [snip]  
                              GGCTCCATTTTCCATGAACTCTATTACTATCATGACTGGTTTCCCT AAAATTAAAAAAAAAAGTTTCAGGAGTCGT                                                     +       93257365        60      [snip]  
                             GGGCTCCATTTTCCATGAACTCTATTACGATCATGACTGGTTTCCCT AAAATTAAAAAAAAAAGTTTCAGGAGTCG                                                      -       93257570        60      [snip]  
                             GGGCTCCATTTTCCATGAACTCTATTACTATCATGACTGGTTTCCCT AAAATTAAAAAAAAAAGTTTCAGGAGTCG                                                      +       93257409        60      [snip]  
                            AGGGCTCCATTTTCCATGAACTCTATTACTATCATGACTGGTTTCCCT AAAATTAAAAAAAAAAGTTTCAGG                                                           +       93257396        60      [snip]  
                            AGGGCTCCATTTTCCATGAACTCTATTACTATCATGACTGGTTTCCCT AAAATTAAAAAAAAAAGTTTCAGGAGTC                                                       +       93257349        60      [snip]  
                            AGGGCTCCATTTTCCATGAACTCTATTACTATCATGACTGGTTTCCCT AAAATTAAAAAAAAAAGTTTCAGGAGTC                                                       +       93257446        60      [snip]  
                            AGGGCTCCATTTTCCATGAACTCTATTACTATCATGACTGGTTTCCCT AAAATTAAAAAAAAAAGTTTCAGGAG                                                         +       93257337        60      [snip]  
                            AGGGCTCCATTTTCCATGAACTCTATTACTATCATGACTGGTTTCCCT AAAATTAAAAAAAAAAGTTTCAGGAG                                                         +       93257407        60      [snip]  
                            AGGGCTCCATTTTCCATGAACTCTATTACTATCATGACTGGTTTCCCT AAAATTAAAAAAAAAAGTTTCAGGAG                                                         +       93257427        60      [snip]  
                      GCATCTAGGGCTCCATTTTCCATGAACTCTATTACTATCATGACTGGTTTCCCT AAAATTAAAAAAAAAAGTTTC                                                              -       93257581        60      [snip]  
                      GCATCTAGGGCTCCATTTTCCATGAACTCTATCACTATCATGACTGGTTTCCCT AAAATTAAAAAAAAAAGTTTCA                                                             +       93257465        60      [snip]  
                  AAATGCATCTAGGGCTCCATTTTCCATGAACTCTATTACTATCATGACTGGTTTCCCT AAAATTAAAAAAAAAAGT                                                                 +       93257414        60      [snip]  
                  AAATGCATCTAGGGCTCCATTTTCCATGAACTCTATTACTATCATGACTGGTTTCCCT AAAATTAAAAAAAAAAGT                                                                 +       93257467        60      [snip]  

However it is clear from the raw Pindel output's pileup that the deletion has really occurred within the longer AAAAAAAAAAA run just to the right of the AAAA run where Pindel has called it.

Any thoughts on how this has happened? (Perhaps it's considered this to be one longer AA…AA homopolymer run with a couple of low-scoring T mismatches?) Or is there a more significant bug here?