for minimap2 remove secondary alignments by default

MHH-RCUG / nf_wochenende

A nextflow version of the Wochenende reference metagenome binning and visualization pipeline

MIT License

13 stars 2 forks source link

for minimap2 remove secondary alignments by default #80

Open colindaven opened 2 years ago

colindaven commented 2 years ago

Tests show removing secondary alignments by default leads to much closer read numbers to original fastq compositon, when using simulated data.

Seems to be specific to minimap2 and esp long reads.

samtools -F 256 -bo filt.bam orig.bam

colindaven commented 2 years ago

[x] implemented, Tests to do

colindaven commented 2 years ago

@irosenboom FYI, in case you test this version and find the new .nosec.bam part in the filenames.

Basically, this is why aligning long reads (or short reads but mostly we use bwa mem for that) with minimap2 led to highly inflated numbers of aligned reads reported. Eg in the mock communities.

This is an optional but recommended flag if using minimap2
Do you think I should just set it to run if --longread is set? Or if minimap2longor minimap2shortis set ? Otherwise not really necessary for bwa mem.

irosenboom commented 2 years ago

Hi @colindaven , thanks for this interesting update. I would set it to run if minimap2long or minimap2short is set, just in case someone wants to use minimap2 instead of bwa mem for short reads.

colindaven commented 2 years ago

Also added a remove supplementary alignments section to the pipeline. I changed the .nosec.bam to .ns. bam, which occurs once for each filter, so .ns.ns

These seem to be only necessary for long reads aligned with minimap2long in my experience.

Also - the setting is configurable using the nextflow.config, but I would always recommend for quantitative usage such as in metagenomics.

colindaven commented 2 years ago

It seems the aligner bwa mem still produces some supplementary alignments despite this switch (significant number with less filtering).

samtools flagstat SRR13594152_200k_R1.fastp.ns.fix.s.dup.mm.mq30.calmd.bam
23831 + 0 in total (QC-passed reads + QC-failed reads)
0 + 0 secondary
57 + 0 supplementary
0 + 0 duplicates

It seems minimap2 does too ...

samtools flagstat tmp_sample1_R1.trm.ns.fix.s.bam
41498 + 0 in total (QC-passed reads + QC-failed reads)
0 + 0 secondary
304 + 0 supplementary
0 + 0 duplicates