Open etal opened 6 years ago
Hi Eric, I would be interested in joining forces for that one if you're up for it. For quite a while now I was thinking of wrapper around DNAcopy::segment that does pretty much what you write plus automatic breakpoints at large gaps. Like PSCBS, but extremely lightweight, only 100-200 lines of code max. Let me know, and I'll draft a prototype API next week or so.
Sounds great! The development version currently runs DNAcopy::segment on each chromosome arm separately -- see skgenome.gary.GenomicArray.by_arm() for the implementation, it just looks for each chromosome's largest gap above a certain threshold.
The segment
command can take a VCF as input with -v
, and that's passed to cnvlib.segmentation.do_segment as you'd expected. When that's used, first the .cnr read depth log2 ratios are segmented, then within each of those segments, the allele frequencies are segmented futher. The change proposed here is to run those segmentations independently, then combine them (with GenomicArray.cut() or .flatten()) and then use the GATK4 algorithm to delete the less-supported breakpoints that result.
Cool, thanks! That looks great. I'll try to find some time for this in the next couple of days.
Sorry for the radio silence here. PSCBS fixed some performance related bugs and it seems working well for me now. I added support for weights, which was my last remaining issue with it. In direct comparison with GATK4, it also looks pretty good.
In GATK4's CNV caller, read depth bins and SNP allele frequencies are segmented separately (by CBS and an HMM, respectively), then intersected (like
GenomicArray.cut
) and a final pass compares adjacent bins to test if the segment breakpoint between them is still supported by the data. This is a simpler merging method than BIC-seq -- see docs.Could be implemented in
call
and/orsegment
.