epi2me-labs / wf-human-variation

Other
86 stars 41 forks source link

add option to force re-alignment #178

Open flokraft85 opened 1 month ago

flokraft85 commented 1 month ago

Is your feature related to a problem?

When using a aligned BAM file as input, wf-human-variants does take the alignment as it is, without doing a re-alignment. Of cause this makes a lot of sense in most cases to save some CPU time, however, if I want to analyze old data again with a new patched or modified version of a genome build, it would be good, to re-align the data.

Describe the solution you'd like

Add an option to force re-alignment of already aligned input data would be great. As minimap2 only accept FASTQ files as input, the pipline has to use samtools fastq or similar tools, for generating the FASTQ files from a BAM.

Describe alternatives you've considered

I can do the BAM to FASTQ conversion by my own and use the resulting FASTQ files as input for wf-human-variation.

Additional context

No response

SamStudio8 commented 1 month ago

however, if I want to analyze old data again with a new patched or modified version of a genome build, it would be good, to re-align the data.

The workflow will realign data to a reference if it detects the references names and lengths in the --ref do not match those in the --bam header. Your related issue with the masked reference is an interesting one (https://github.com/epi2me-labs/wf-human-variation/issues/177) and presents a use-case that we've not encountered before (where the ref looks the same in terms of name/length, but not content). We'll have a think about how we might approach this.

In the mean time, you could use wf-alignment to handle alignment and then provide the output from that workflow to wf-human-variation. Alternatively, you could strip alignment information with samtools reset and provide the unaligned BAM to wf-human-variation, which will trigger alignment to your chosen reference.