fulcrumgenomics / fgsv

Tools to gather evidence for structural variation via breakpoint detection.
MIT License
19 stars 1 forks source link

`AggregateSvPileup` should account for inaccurate split-read breakpoint positions #13

Open pamelarussell opened 2 years ago

pamelarussell commented 2 years ago

Currently AggregateSvPileup merges breakpoints that have left and right breakpoints within a distance threshold of each other, regardless of the type of read evidence of the breakpoints: split-read (breakpoint occurs inside sequenced read) or read-pair (breakpoint occurs in the unsequenced insert between mates).

However, these two types of evidence have different precision of the breakpoint position and should use different distance thresholds. While split-read evidence is likely to point to a very precise position, the position for a read-pair event can be off by as much as the inner distance (insert size minus read lengths). Something similar to the following procedure should be used instead:

  1. "Seed" clusters by clustering only breakpoints that have split-read evidence
  2. "Seed" additional clusters with breakpoints that have read-pair evidence
  3. Use read-pair events to aggregate clusters when the distance is within the inner distance (computed empirically by sampling)
tfenne commented 1 year ago

Agreed - I think a multi-pass strategy would work, though I think I would suggest something different: