cio-abcd / variantinterpretation

Collaborative Interpretation-Pipeline workflow based on nf-core pipeline structure
MIT License
7 stars 1 forks source link

QC control and coverage visualization #4

Open biolancer opened 1 year ago

biolancer commented 1 year ago

Description of feature

In case of f.e. low sample quality, assessment of read and coverage statistics is relevant for the inclusion or exclusion of potentially clinically relevant mutations and the required data generation should be considered during upstream preprocessing of alignment and variant calls.

Visualization could be achieved by generating IGV compatible data formats (ROI-subsampled BAM, CRAM) during the preprocessing procedures. IGV is capable to import sessions based on HTML and XML data formats, so that in case of known regions of interest for a specific panel/tumor entitity based on readily accessible reference data, a report could be generated which could be (re)loaded into IGV for a quick lookup of re-occuring problematic regions.

sci-kai commented 1 year ago

We discussed today to also include coverage calculations for target regions. This would require BAM and BED files as input and should ideally report the coverage of each individual region (which are often individual exons) and a summary of these. That could be done using mosdepth: https://github.com/brentp/mosdepth

biolancer commented 1 year ago

As discussed today, a check on the VCF file integrity (e.g. concordant genomic coordinates to the given reference genome version) could further enhance the pipeline. This would potentially require the integration of BAM files as additional input.