Illumina / canvas

Canvas - Copy number variant (CNV) calling from DNA sequencing data
Other
121 stars 20 forks source link

/canis_familiaris.vcf' should contain one genotypes column corresponding to sample #125

Open osowiecki opened 5 years ago

osowiecki commented 5 years ago

I'm using a dog dbsnp file with identical format to files supplied by Canvas team. How should I prepare my dbsnp file or ploidy.vcf for canvas to stop complaining about the missing genotype?

CHROM POS ID REF ALT QUAL FILTER INFO FORMAT CF_6942

X 4028 . N . PASS END=123869066 CN 1

########################################

' Job error message: System.ArgumentException: File '/home/mobit/DOG/data/canis_familiaris.vcf' should contain one genotypes column corresponding to sample CF_6942 at CanvasSNV.SNVReviewer.LoadVariants(String vcfPath, Boolean isSomatic) in D:\TeamCity\buildAgent\work\a29a190a11771d97\Src\Canvas \CanvasSNV\SNVReviewer.cs:line 88 at CanvasSNV.SNVReviewer.Run() in D:\TeamCity\buildAgent\work\a29a190a11771d97\Src\Canvas\CanvasSNV\SNVReviewer.cs:line 63 at CanvasSNV.Program.Run(String[] args) in D:\TeamCity\buildAgent\work\a29a190a11771d97\Src\Canvas\CanvasSNV\Program.cs:line 109 at CanvasSNV.Program.Main(String[] args) in D:\TeamCity\buildAgent\work\a29a190a11771d97\Src\Canvas\CanvasSNV\Program.cs:line 26 2019-06-25T12:51:06+02:00,Launching process for job CanvasSNV-'CF_6942'-'24': ' ########################################

dbsnp.vcf looks like this :

########################################

fileformat=VCFv4.1

fileDate=20180316

source=ensembl;version=92;url=http://e92.ensembl.org/Canis_lupus_familiaris

reference=ftp://ftp.ensembl.org/pub/release-92/fasta/Canis_lupus_familiaris/dna/

INFO=

INFO=<ID=TSA,Number=1,Type=String,Description="Type of sequence alteration. Child of term sequence_alteration as defined by the sequ

ence ontology project.">

INFO=<ID=E_Cited,Number=0,Type=Flag,Description="Cited.http://www.ensembl.org/info/docs/variation/data_description.html#evidence_sta

tus">

INFO=<ID=E_Multiple_observations,Number=0,Type=Flag,Description="Multiple_observations.http://www.ensembl.org/info/docs/variation/da

ta_description.html#evidence_status">

INFO=<ID=E_Freq,Number=0,Type=Flag,Description="Frequency.http://www.ensembl.org/info/docs/variation/data_description.html#evidence_

status">

INFO=<ID=E_Hapmap,Number=0,Type=Flag,Description="HapMap.http://www.ensembl.org/info/docs/variation/data_description.html#evidence_s

tatus">

INFO=<ID=E_Phenotype_or_Disease,Number=0,Type=Flag,Description="Phenotype_or_Disease.http://www.ensembl.org/info/docs/variation/data

_description.html#evidence_status">

INFO=<ID=E_ESP,Number=0,Type=Flag,Description="ESP.http://www.ensembl.org/info/docs/variation/data_description.html#evidence_status"

INFO=<ID=E_1000G,Number=0,Type=Flag,Description="1000Genomes.http://www.ensembl.org/info/docs/variation/data_description.html#eviden

ce_status">

INFO=<ID=E_ExAC,Number=0,Type=Flag,Description="ExAC.http://www.ensembl.org/info/docs/variation/data_description.html#evidence_statu

s">

CHROM POS ID REF ALT QUAL FILTER INFO

1 112 rs850979046 A G . . dbSNP_151;TSA=SNV 1 132 rs851217143 C A . . dbSNP_151;TSA=SNV 1 147 rs853028708 G A . . dbSNP_151;TSA=SNV 1 194 rs850921736 G T . . dbSNP_151;TSA=SNV 1 208 rs851402391 T C . . dbSNP_151;TSA=SNV 1 237 rs852954153 T C . . dbSNP_151;TSA=SNV ...

Full command :

Canvas SmallPedigree-WGS -b ./bam/CF_6942.bam --sample-b-allele-vcf=./data/canis_familiaris.vcf -o ./CNV_TEST/CF_6942 -r ./data/kmers.fasta -g ./data/canFam3/ --filter-bed=./data/filter.bed --ploidy-vcf=./data/ploidy.vcf

eroller commented 5 years ago

when using --sample-b-allele-vcf you need to have a GT since this is the sample's VCF. If you want to use a dbsnp VCF without GT then you need to provide it via the --population-b-allele-vcf option.

osowiecki commented 5 years ago

using --population-b-allele-vcf= with dbsnp.vcf and --ploidy-vcf= with identical header as in dbsnp.vcf still crashed the application with the same reason. Should it work like that? I wanted to use dbsnp.vcf and still mark chromosome X in my ploidy.vcf

Edit : Ok, I can see that the ploidy.vcf can still have proper structure with the GT column. I thought all vcf files have to have the same header. My mistake.

eroller commented 5 years ago

Correct, the ploidy vcf is sample specific so must contain GT field. dbsnp is a population vcf so GT is not used.