Closed matthdsm closed 1 year ago
Hi @matthdsm !
My understanding of callable regions from the test run and the code:
Cleaning of the bed file (variant_regions: kit.bed) - remove headers and sort: https://github.com/bcbio/bcbio-nextgen/blob/master/bcbio/variation/bedutils.py#L86
merge the overlapping regions
bgzip & tabix
Calculate coverage with mosdepth: https://github.com/bcbio/bcbio-nextgen/blob/master/bcbio/variation/coverage.py#L74 the output for callable regions is sample_variant_regions.quantized.bed.gz https://github.com/bcbio/bcbio-nextgen/blob/master/bcbio/variation/coverage.py#L108
The mosdepth command:
export MOSDEPTH_Q0=NO_COVERAGE && \
export MOSDEPTH_Q1=LOW_COVERAGE && \
export MOSDEPTH_Q2=CALLABLE && \
/path/to/anaconda/envs/htslib1.10/bin/mosdepth -t 16 -F 1804 -Q 1 --no-per-base \
--by /path/to/NA12878-exome-eval/work/bedprep/cleaned-Exome-NGv3-merged.bed \
--quantize 0:1:4: /path/to/NA12878-variant_regions \
/path/to/work/align/NA12878/NA12878-prealign-dedup.bam
The thresholds for CALLABLE are being passed through quantize argument: https://github.com/brentp/mosdepth#callable-regions-example
Meaning that in bcbio:
Then bcbio calculates an intersection between coverage-based mosdepth-calculated callable and variant_regions.bed: https://github.com/bcbio/bcbio-nextgen/blob/master/bcbio/variation/coverage.py#L111 I.e. the subset of the callable.bed is returned which overlaps with variant_regions.bed. The result is quantized-vrubset.bed, which contains CALLABLE, LOW_COVERAGE, NO_COVERAGE
preparing for parallelization stage: bcbio determines the blocks of the genome which could be safely called by different threads, the breakpoints are inserted in non-callable regions: https://github.com/bcbio/bcbio-nextgen/blob/master/bcbio/bam/callable.py#L71 and see more logic there on how parallelization block are built
In the actual calling HaplotypeCaller is run in parallel against a small BED file, for example a block for chr:114,137,346-143,402,081 which contains many callable regions.
How the regions are created: https://github.com/bcbio/bcbio-nextgen/blob/master/bcbio/variation/joint.py#L94
Concatenation of variants https://github.com/bcbio/bcbio-nextgen/blob/master/bcbio/variation/vcfutils.py#L390
Filtration.
SN
Awesome, thanks for clarifying this @naumenko-sa! Magnificent breakdown.
Hi,
Could you explain how the callable regions bed is made precisely? I've tried figuring it out from the code, but I've got the feeling I'm missing something.
Thanks Matthias