Open vymao opened 3 years ago
Hi @vymao! To be honest I've never used CNVkit for tumour CNV calling in practice. The following advice is theoretical and is based on investigating the source code, so perhaps take it with a grain of salt.
When both SAMPLE_ID
and NORMAL_ID
are provided, all VCF records are used for both of them. CNVkit looks at certain FORMAT fields (DP, AD, GT) and deduces whether a given variant is present in tumour and/or normal samples from those fields.
The way I understand it, the intended use for those parameters is to have two separate samples sequenced (correspondingly, tumour and normal from the same patient) and to supply CNVkit with a joint callset.
If you are certain that your filtered Mutect2 callset is a reasonable substitution for an actual germline sample (but don't rely on me here, as I'm not an expert in tumour genetics), then you should combine your data into one VCF with two samples. For variants which are only present in the somatic callset, fill in the GT of 0/0 and AD of 0 in the germline callset.
Please let me know if you have any more questions and I'll try to help!
Thanks, though I am still a bit confused. What happens then if I only provide one ID, or none? How is the filtering different than if I provide both records?
Also, when you say "you should combine your data into one VCF with two samples" if I have a callset I am confident represents germline variants, what does this mean?
Hi,
I am trying to call allelic CN values. I have a Mutect2 VCF, but I also have a curated list of germline SNVs that I got by filtering the Mutect2 VCF. I am wondering how to properly use either in the calling.
For example, I see that
cnvkit.py call
has these parameters:Are we meant to use both flags? I am confused because it seems that using
--sample-id
will use all the SNVs in a file, whereas--normal-id
will selectively use some SNVs, but I am not sure how. Some clarification would be very helpful.