Open weishwu opened 5 years ago
I agree that doesn't seem correct. Could you share the relevant portion of this .cnr file?
One possibility is that the SNPs in this segment were rejected by CNVkit for some reason, based on the contents of the VCF records there.
Thanks for replying quickly.
The .cnr, .cns and the vcf files can be downloaded here: https://usegalaxy.org/u/mapped%20reads%20in%20illumina%2011/h/cnvkitdata
The vcf contains the variants called by GATK HaplotypeCaller from tumor and normal samples together. Mutect2 was used to call somatic variants and I compared these two and removed somatic variants.
My command-line is:
# batch processing with all samples
cnvkit batch *_Tumor.bam --normal *_Normal.bam --method wgs -p 1 --fasta ${genome_fasta} --annotate ${gene_anno} --output-reference C57BL6NJ_reference.cnn --scatter --diagram --output-dir cnvkit_call/
# redo segmentation with sample 6665, by using a vcf for LOH detection
cnvkit call Sample_6665_recal_Tumor.cns -y -v Rhim_CU44_HC_SnpIndel_PASS_DP8GQ20_sample_6665_nonSomatic.vcf -i sample_6665 -n C57BL_6NJ -z 0.25 -o Sample_6665_recal_Tumor_withHCnonSomatic_recall.cns
The output file is "Sample_6665_recal_Tumor_withHCnonSomatic_recall_fixed.cns", in which chr1 is mostly called as cn=0 with baf=0.
Hello,
I am also getting baf values of 0 when using varscan2 germline/loh calls. It happens for every tumor/normal pair in every segment. I do not get any errors or warnings
>cnvkit.py segment T448-03.cnr -v varscan2.snps.clean.vcf -i T448-03 -n 10111 -o T448-03.cns
Selected test sample T448-03 and control sample 10111
Loaded 27024 records; skipped: 0 somatic, 5839 depth
Skipping 10 additional somatic records based on T/N genotypes
Kept 15987 heterozygous of 27014 VCF records
Segmenting with method 'cbs', significance threshold 0.0001, in 1 processes
Re-segmenting on variant allele frequency
Done, now finalizing
Re-segmenting on variant allele frequency
Done, now finalizing
Re-segmenting on variant allele frequency
...
Seems like it is working, but the results do not carry to loh calls.
chromosome start end log2 baf depth probes weight
1 12145 121884666 -0.0538616 0 74.6717 17653 13987.6
1 143279634 145608570 -0.0667195 0 71.8267 369 252.614
1 145609070 146065793 0.164872 0 89.5717 185 149.673
1 146066293 148586512 -0.0188062 0 76.2323 549 387.678
1 148587956 248945922 0.114365 0 75.6463 14322 11366.8
2 10500 92110323 0.000133892 0 53.6863 9954 7901.54
Happy to share vcf information too, but let me just say that they are all PASS snps with lots and lots of 0/1 positions in the normal. And all of the tumor and normal names match exactly.
Is there a specific FORMAT column that is required? GT is present, but there is not a AF column.
Thank you.
Oh wow, it is the missing AF column I think. I tried a different caller, VarDict, which has an AF column. The stdout changed to reflect breakpoints and the baf column is populated.
I concatenated the germline and LOH variants called by VarScan and used that in cnvkit call. A 43Mb region was called as a deletion with BAF=0. However, by looking at the SNPs that are 0/1 in normal sample in this region, 60% changed to 0/0 or 1/1, while 40% remained as 0/1, based on the genotype calling from VarScan. How can the average BAF become 0 in this region?