Open colindaven opened 2 years ago
@irosenboom FYI, in case you test this version and find the new .nosec.bam
part in the filenames.
Basically, this is why aligning long reads (or short reads but mostly we use bwa mem for that) with minimap2 led to highly inflated numbers of aligned reads reported. Eg in the mock communities.
--longread
is set? Or if minimap2long
or minimap2short
is set ? Otherwise not really necessary for bwa mem.Hi @colindaven , thanks for this interesting update. I would set it to run if minimap2long
or minimap2short
is set, just in case someone wants to use minimap2 instead of bwa mem for short reads.
Also added a remove supplementary alignments section to the pipeline. I changed the .nosec.bam
to .ns. bam
, which occurs once for each filter, so .ns.ns
These seem to be only necessary for long reads aligned with minimap2long in my experience.
Also - the setting is configurable using the nextflow.config, but I would always recommend for quantitative usage such as in metagenomics.
It seems the aligner bwa mem still produces some supplementary alignments despite this switch (significant number with less filtering).
samtools flagstat SRR13594152_200k_R1.fastp.ns.fix.s.dup.mm.mq30.calmd.bam
23831 + 0 in total (QC-passed reads + QC-failed reads)
0 + 0 secondary
57 + 0 supplementary
0 + 0 duplicates
It seems minimap2 does too ...
samtools flagstat tmp_sample1_R1.trm.ns.fix.s.bam
41498 + 0 in total (QC-passed reads + QC-failed reads)
0 + 0 secondary
304 + 0 supplementary
0 + 0 duplicates
Tests show removing secondary alignments by default leads to much closer read numbers to original fastq compositon, when using simulated data.
Seems to be specific to minimap2 and esp long reads.
samtools -F 256 -bo filt.bam orig.bam