Open WimSpee opened 4 years ago
Hi @WimSpee, sorry for the delay in getting back to you, I've been trying to fix our tests and docker builds before making a bunch of changes.
Thanks, this is a good idea. Does this actually use your file correctly with your tweak? Looking at the code if it works it will only work for joint calling but it would be good to know if it at least works there.
Hi Rory, no worries. Thank you for having a look.
In my test the pre-defined callable_regions
bed file was picked up and used by bcbio for running the joint variant calling in parallel.
We only set this pre-defined callable_regions
bed file for joint analysis.
For analyzing new samples from fastq to bam&gvcf we use the default nomap_split_targets
functionality in bcbio.
I don't think there is much risk in adding callable_regions
to ALGORITHM_KEYS.
Except for people defining a callable_regions
bed file that does not makes sense, but this could be pointed out in the documentation.
We will run gatk/picard GenotypeConcordance
to confirm that the results produced via this way are identical for samples that were already in a previous joint variant calling.
Version info
bcbio_nextgen.py --version
): 1.1.5To Reproduce Exact bcbio command you have used:
Your sample configuration file: Repeated for many samples
Observed behavior Error message or bcbio output:
Expected behavior The callable bed file to be picked up and used for joint analysis / GVCF square off.
Adding
callable_regions
to ALGORITHM_KEYS (line 543) results in the expected behavior https://github.com/bcbio/bcbio-nextgen/blob/master/bcbio/pipeline/run_info.pyAdditional context It is computational expensive to re-calculate
callable_regions
for recurrent incremental joint analysis of many GVCF files. Allowing to specify a pre-calculatedcallable_regions
bed file to be defined removes the need to specify input bam files for joint analysis, and also removes the computational cost and time. While at the same time allowing more callable areas to be analyzed in parallel than the current per chromosome default for this scenario in bcbio.The pre-calculated
callable_regions
bed file can be re-used from a previous bcbio (joint) analysis. Or calculated from the reference genome via for for example viapicard ScatterIntervalsByN
:An easy conversion from Picard interval format to bed file is still needed when using picard.