etal / cnvkit

Copy number variant detection from targeted DNA sequencing
http://cnvkit.readthedocs.org
Other
547 stars 165 forks source link

Format for theta export #522

Open DominikGlodzik opened 4 years ago

DominikGlodzik commented 4 years ago

Hello CNVkit team

I wonder if you advise on the specification of VCF format for the Theta export command: cnvkit.py export theta Sample_Tumor.cns reference.cnn -v Sample_Paired.vcf What information is required in the VCF file, and how should it be formatted (eg. info vs format fields)?

I could not find this information in the documentation, and my trial and error failed.

Best wishes Dominik

etal commented 4 years ago

The VCF file here is not specific to Theta, it's used by CNVkit to extract b-allele frequencies. The format is the same as for other CNVkit commands: https://cnvkit.readthedocs.io/en/stable/fileformats.html#vcf

HyunjunNam commented 4 years ago

Hi, I have similar question on this. We are using CNVkit on Panel of Normal, because we don't have matched tumor-normal samples. In this case, can we create Sample_Paired.vcf by combining a vcf from one tumor sample and several vcfs from all normal samples that were used for generating PoN with PEDIGREE tag to the VCF header?

Best, Hyunjun

etal commented 4 years ago

If you don't have a matched normal for each tumor sample, you can use an unmatched normal instead -- you only need 1 normal. The intent is to identify the likely germline-heterozygous SNPs that are present in the tumor sample, because these can be used to determine tumor fraction, whereas somatic mutations' allele frequencies are too noisy/heterogeneous to use here.

You can get the same effect independently by using the PoN or even dbSNP or other population genetics databases to filter the tumor-only VCF down to population SNP sites.