etal / cnvkit

Copy number variant detection from targeted DNA sequencing
http://cnvkit.readthedocs.org
Other
502 stars 163 forks source link

segment, call: Implement GATK4-style segment merging #252

Open etal opened 6 years ago

etal commented 6 years ago

In GATK4's CNV caller, read depth bins and SNP allele frequencies are segmented separately (by CBS and an HMM, respectively), then intersected (like GenomicArray.cut) and a final pass compares adjacent bins to test if the segment breakpoint between them is still supported by the data. This is a simpler merging method than BIC-seq -- see docs.

Could be implemented in call and/or segment.

lima1 commented 6 years ago

Hi Eric, I would be interested in joining forces for that one if you're up for it. For quite a while now I was thinking of wrapper around DNAcopy::segment that does pretty much what you write plus automatic breakpoints at large gaps. Like PSCBS, but extremely lightweight, only 100-200 lines of code max. Let me know, and I'll draft a prototype API next week or so.

etal commented 6 years ago

Sounds great! The development version currently runs DNAcopy::segment on each chromosome arm separately -- see skgenome.gary.GenomicArray.by_arm() for the implementation, it just looks for each chromosome's largest gap above a certain threshold.

The segment command can take a VCF as input with -v, and that's passed to cnvlib.segmentation.do_segment as you'd expected. When that's used, first the .cnr read depth log2 ratios are segmented, then within each of those segments, the allele frequencies are segmented futher. The change proposed here is to run those segmentations independently, then combine them (with GenomicArray.cut() or .flatten()) and then use the GATK4 algorithm to delete the less-supported breakpoints that result.

lima1 commented 6 years ago

Cool, thanks! That looks great. I'll try to find some time for this in the next couple of days.

lima1 commented 4 years ago

Sorry for the radio silence here. PSCBS fixed some performance related bugs and it seems working well for me now. I added support for weights, which was my last remaining issue with it. In direct comparison with GATK4, it also looks pretty good.