Many germline events have microhomology at the breakpoint. When matching an event, this should be taken into account as there are many way to report the event in VCF. Figure 9 of the VCF specifications outlines this.
Some callers report CIPOS (and in the case of GRIDSS, the non-standard HOMPOS field) which is what the caller itself thinks the extent of the homology, but others do not.
Properly matching equivalent variants can be very messy. For example, in the HG002 truth set, a sine duplication within a sine repeat (ref=SINE-SINE-SINE , var=SINE-SINE-SINE-SINE) is reported as an INS event after the 3rd SINE, but the short read callers report it as a DUP of the first sine. These calls both result in the same sequence, but they're 600bp away from each other! Events such as these are a bit extreme and difficult to handle but a basic check of homology, and/or respecting the CIPOS reported by the variant caller will result in more accurate benchmarking results.
The delta between the caller event length and the actual event length for TPs is a good indicator of how well a SV gets the length correct. You should find that, in contrast to overall average lengths, length deltra predicted by BreakDancer do not match the actual event lengths very closely - something that has the potent to change someone's choice of caller.
Many germline events have microhomology at the breakpoint. When matching an event, this should be taken into account as there are many way to report the event in VCF. Figure 9 of the VCF specifications outlines this.
Some callers report
CIPOS
(and in the case of GRIDSS, the non-standardHOMPOS
field) which is what the caller itself thinks the extent of the homology, but others do not.Properly matching equivalent variants can be very messy. For example, in the HG002 truth set, a sine duplication within a sine repeat (ref=SINE-SINE-SINE , var=SINE-SINE-SINE-SINE) is reported as an INS event after the 3rd SINE, but the short read callers report it as a DUP of the first sine. These calls both result in the same sequence, but they're 600bp away from each other! Events such as these are a bit extreme and difficult to handle but a basic check of homology, and/or respecting the
CIPOS
reported by the variant caller will result in more accurate benchmarking results.The delta between the caller event length and the actual event length for TPs is a good indicator of how well a SV gets the length correct. You should find that, in contrast to overall average lengths, length deltra predicted by BreakDancer do not match the actual event lengths very closely - something that has the potent to change someone's choice of caller.