cio-abcd / variantinterpretation

Collaborative Interpretation-Pipeline workflow based on nf-core pipeline structure
MIT License
7 stars 1 forks source link

VCF Preprocessing: Optional prefiltering of Regions-Of-Interest using BED-file #20

Open biolancer opened 1 year ago

biolancer commented 1 year ago

Description of feature

Depending on the sequencing context and the variant caller, mutations outside of the assumed covered regions of a panel / the WES kit can be called due to mis- or ambiguous alignment. In most cases, these mutations outside of the Region-of-Interest (ROI) of the caller can be considered technical artifacts or "undesired" calls.

I thus propose an optional pre-filtering step for the VCF data, which would use bcftools filter and a provided bed-file (potentially as an additional field in the samplesheet if multiple VCFs from different panels / WES kits will be provided) to exclude mutations outside the provided ROI.

To circumvent a potential loss of information due to the prefiltering, the mutations outside of the ROI could also be flagged as "outside_ROI" in the VCF FILTER field.

The ROI pre-filtering routine would come with a BED-check routine which checks if the provided BED-file is well-structured for the purpose of filtering and would skip the step if this is not the case. The workflow should not break, only issue a warning.