bcbio / bcbio-nextgen

Validated, scalable, community developed variant calling, RNA-seq and small RNA analysis
https://bcbio-nextgen.readthedocs.io
MIT License
989 stars 354 forks source link

Request: skip GenomicsDBImport and GenotypeGVCFs, but still get a GVCF for a single sample #3487

Open WimSpee opened 3 years ago

WimSpee commented 3 years ago

Version info

To Reproduce Exact bcbio command you have used:

bcbio_nextgen.py ../config/single_test_sample-merged.yaml  -t ipython -n 101 -s sge -q main -r vf=5G,h_rss=50G

Your sample configuration file:

details:
- analysis: variant2
genome_build: my_genome
description:
# to do multi-sample variant calling, assign samples the same metadata / batch
metadata:
batch: DA_test
algorithm:
aligner: bwa
mark_duplicates: true
recalibrate: false
realign: false
variantcaller: gatk-haplotype
#jointcaller: gatk-haplotype-joint
ploidy:
default: 2
nomap_split_targets: 1000
effects: false
tools_off:
- gemini
# for targetted projects, set the region
# variant_regions: /path/to/your.bed

Observed behavior

A bam file is created and a single sample VCF file. No GVCF file is created.

Expected behavior A bam file is created and a single sample GVCF file. No VCF file is created.

Additional context

If jointcaller: gatk-haplotype-joint is in the yaml config, then GATK HaplotypeCaller is used with the -ERC GVCF argument followed by GenomicsDBImport and GenotypeGVCFs.

But we plan to run GenomicsDBImport and GenotypeGVCFs in a later run of bcbio on multiple GVCF files.
GenomicsDBImport and GenotypeGVCFs also makes more sense when combining multiple GVCF files.

Also even for just 1 sample, GenomicsDBImport and GenotypeGVCFs take around 50% of the wall time for the analysis of a single sample starting from FASTQ files.

Therefore we would like to be able to skip GenomicsDBImport and GenotypeGVCFs , but still get a GVCF for the single sample.

But just commenting out jointcaller: gatk-haplotype-joint removes the -ERC GVCF argument for GATK HaplotypeCaller. Is there a way to skip GenomicsDBImport and GenotypeGVCFs for a single sample analysis, but still get a GVCF file for that sample?

Thank you.

matthdsm commented 3 years ago

You can use tools on: gvcf https://bcbio-nextgen.readthedocs.io/en/latest/contents/configuration.html#changing-bcbio-defaults

WimSpee commented 3 years ago

@matthdsm Thank you for the tip. I will try tools on: gvcf without jointcaller: gatk-haplotype-joint to see if that creates the GVCF file without running GenomicsDBImport and GenoypteGVCFs for the single sample analysis.