Closed hliang closed 6 years ago
Thanks much for the report and apologies about the issue. You're right that the proper behavior should be to avoid an additional regional filter here if variant_regions
was not specified in the configuration. The logic bcbio uses is to joint/batch call over the regions with the most coverage:
but there is no need to apply extra selection afterwards. If you update to the latest development version:
bcbio_nextgen.py upgrade -u development
it should now do the right thing here.
More generally, if this is a targeted experiment you should specify the target regions with variant_regions
in the input. This avoids calling in off-target regions, which is generally noise, and also avoids the necessarily imperfect logic of bcbio trying to figure out what set of regions over multiple samples you really want to call on.
Thanks again for the report and hope this helps.
Great. It works. Thank you for the fix.
I'm running joint variant calling for a batch of 80 exome samples, using gatk-haplotypecaller. The yaml file looks like below, as you see, no target region is specified.
At post-processing stage, some messages in bcbio-nextgen.log catch my eye:
The commands.log shows that bcftools filter is run with
-T
option using a bed file from one sample (not the first or last sample, more like a random one).This behavior would result in some variants being absent in the result. When a poorly covered sample is selected for target region, variants in well covered samples might be filtered and excluded (if these sites are not covered in the poorly covered sample).