etal / cnvkit

Copy number variant detection from targeted DNA sequencing
http://cnvkit.readthedocs.org
Other
548 stars 166 forks source link

Segment allele frequencies directly #34

Open etal opened 9 years ago

etal commented 9 years ago

Extract heterozygous allele frequencies from the tumor sample VCF, and segment those values to generate a .cns file. CBS, haar wavelet and fused lasso are all fine algorithms for this.

etal commented 9 years ago

The plan is to use PSCBS to perform segmentation of both log2 copy ratios and allele frequencies if a VCF file of SNVs is provided. If the VCF contains a paired normal/control sample in addition to the tumor/test sample, take advantage of that.

Emit a .cns file with extra columns indicating allele-specific values.

etal commented 9 years ago

@chapmanb I saw you added a similar feature in bcbio for BubbleTree, but segmenting on allele frequencies instead of rather than in addition to total copy number. It looks like this would be generally useful to have in CNVkit. Am I reading that right? If so, mind if I borrow the code and expose it through CNVkit's segment command?

chapmanb commented 9 years ago

Eric; You're absolutely welcome to any of my code that's of use. However the BubbleTree work is not well tested right now and needs work to get up to date with the latest BubbleTree development version so it might not be the best time to grab it. I'm going to try and reboot the heterogeneity analysis work in bcbio and can give you a heads up when this is better tested and integrated.

etal commented 9 years ago

Loosely relevant to this issue, here's a paper where PyClone was used with CNVkit to identify/examine subclones. I'll have to read this paper more carefully and maybe ask the lead author about the experience.

mheskett commented 6 years ago

Would be very interested if segment could work with allele frequencies solely. Is there a way to do this?

As implemented--including an extra column of the average b-allele frequency of a copy number segment--is worthless because copy neutral loss of heterozygosity can drive many dramatic changes in baf within a single copy number segment.

etal commented 6 years ago

You could probably manage it through the cnvlib API -- see the implementation of cnvlib.segment.__init__ where the variants array is used, it's just a few lines to run segmentation on the SNP allele frequencies once you have loaded them via cnvlib.cmdutil.load_het_snps.

Once you've segmented your read depth ratios and/or SNP b-allele frequencies, use the call command to infer integer total copy number and allele-specific copy number (the cn, cn1 and cn2 columns, respectively). Uniparental disomy shows up in the latter columns but not the former.

mheskett commented 6 years ago

@etal OK thanks. I got the variants loaded through the API, but can you give another hint on how to call which method in segments/init to segment on just baf?