Open etal opened 9 years ago
The plan is to use PSCBS to perform segmentation of both log2 copy ratios and allele frequencies if a VCF file of SNVs is provided. If the VCF contains a paired normal/control sample in addition to the tumor/test sample, take advantage of that.
Emit a .cns file with extra columns indicating allele-specific values.
@chapmanb I saw you added a similar feature in bcbio for BubbleTree, but segmenting on allele frequencies instead of rather than in addition to total copy number. It looks like this would be generally useful to have in CNVkit. Am I reading that right? If so, mind if I borrow the code and expose it through CNVkit's segment
command?
Eric; You're absolutely welcome to any of my code that's of use. However the BubbleTree work is not well tested right now and needs work to get up to date with the latest BubbleTree development version so it might not be the best time to grab it. I'm going to try and reboot the heterogeneity analysis work in bcbio and can give you a heads up when this is better tested and integrated.
Loosely relevant to this issue, here's a paper where PyClone was used with CNVkit to identify/examine subclones. I'll have to read this paper more carefully and maybe ask the lead author about the experience.
Would be very interested if segment could work with allele frequencies solely. Is there a way to do this?
As implemented--including an extra column of the average b-allele frequency of a copy number segment--is worthless because copy neutral loss of heterozygosity can drive many dramatic changes in baf within a single copy number segment.
You could probably manage it through the cnvlib
API -- see the implementation of cnvlib.segment.__init__
where the variants array is used, it's just a few lines to run segmentation on the SNP allele frequencies once you have loaded them via cnvlib.cmdutil.load_het_snps
.
Once you've segmented your read depth ratios and/or SNP b-allele frequencies, use the call
command to infer integer total copy number and allele-specific copy number (the cn
, cn1
and cn2
columns, respectively). Uniparental disomy shows up in the latter columns but not the former.
@etal OK thanks. I got the variants loaded through the API, but can you give another hint on how to call which method in segments/init to segment on just baf?
Extract heterozygous allele frequencies from the tumor sample VCF, and segment those values to generate a .cns file. CBS, haar wavelet and fused lasso are all fine algorithms for this.