Closed ahy1221 closed 6 years ago
@chapmanb, I was wondering that could you do me a favor to test my vcf files and tell me what is wrong with these files ? Since these files are totally generated by bcbio pipeline automatically, I cannot find out what is wrong in it. Thanks
Hi @chapmanb , @brentp help me to find out this error is caused by D20180108 occurring twice in the ped file. After removing the repeated line of D20180108 in the ped file , the pipeline continue to work fine now. But the question now is how this wrong ped file is generated during the bcbio pipeline ? The template yaml file I feed to bcbio is shown below again:
details:
- analysis: variant2
genome_build: GRCh37
# In order to do paired variant calling, samples should belong to the
# same batch ("batch" under "metadata" below") and have a "phenotype"
# field stating either "normal" or "tumor". For each batch there
# should be a sample with "tumor" phenotype and a sample with "normal"
# phenotype (no more than two samples per batch)
metadata:
batch: your-batch-name
phenotype: normal # or "normal"
algorithm:
aligner: bwa
variantcaller:
somatic: [vardict, mutect2, strelka2]
germline: [gatk-haplotype]
ensemble:
numpass: 2
# for targetted projects, set the region
variant_regions: /WPSnew/heyao/reference/exomCapture/S04380110_SureSelect_Human_All_Exon_V5_Covered.chr.bed
svcaller: [cnvkit]
#hlacaller: [optitype]
and the sample meta for generating the real config yaml is :
samplename,description,batch,phenotype
PE1Z_ZZM_20171109-normal-DNA,D20171109_normal,D20171109,normal
PE1Z_ZZM_20171109-tumor-DNA,D20171109_tumor,D20171109,tumor
PE1Z_ZZM_20171215-normal-DNA,D20171215_normal,D20171215,normal
PE1Z_ZZM_20171215-tumor-DNA,D20171215_tumor,D20171215,tumor
PE1Z_ZZM_20180108-normal-DNA,D20180108_normal,D20180108,normal
PE1Z_ZZM_20180108-tumor-DNA,D20180108_tumor,D20180108,tumor
PE1Z_ZZM_20180110-normal-DNA,D20180110_normal,D20180110,normal
PE1Z_ZZM_20180110-tumor-DNA,D20180110_tumor,D20180110,tumor
PE1Z_ZZM_20180116-normal-DNA,D20180116_normal,D20180116,normal
PE1Z_ZZM_20180116-tumor-DNA,D20180116_tumor,D20180116,tumor
Is it anything wrong ?
Yao; Thanks much for the detailed bug report and work identifying the underlying issue. I pushed a fix to bcbio to avoid double generating samples in the created ped file. They should now match the samples in the VCF file and hopefully avoid this issue. Sorry about the problem, and please let us know if you run into any other issues at all. Thanks again.
Hi, I am using bcbio cancer germline and somatic mutation pipeline for my own exome data. The config for my pipeline is
The bcbio pipeline work so well until the annotation step using gemini-vcf2db. I am getting an error caused by vcf2db command:
The gatk-haplotype vcf file and ped file are both generated by bcbio pipeline. Thus I am not sure what is wrong with my vcf file for vcf2db.
The vcf and ped file for recovering this error can be download by this link: debug files
Thanks in advance!