BenLangmead / bowtie2

A fast and sensitive gapped read aligner
GNU General Public License v3.0
638 stars 160 forks source link

SAMflags to removes mapped concordantly >1 times #469

Closed Pompamm closed 2 months ago

Pompamm commented 2 months ago

Hi all expertise,

I am working on a shotgun metagenome, and the problem is that after I assembly all the metagenome contigs for binning, the part of mapping contigs back to short reads using bowtie2 to select only mapped reads to calculated coverages before constructing MAGs by SAMtools defines SAMflags.

bowtie2 -x assembly -q -1 ../data/"$sample"_1.fastq.gz \
           -2 ../data/"$sample"_2.fastq.gz -p 4 \
           -S "$sample".sam

And i got the result shown as image

According to this result, I want to include:

To select these mapped reads, I have been searching for the solution in every platform, and I tried several SAMflags, but I still did not get what I wanted.

For example, -F 2308: 57,247,551 (Exclude ReadUnmapped, SecondaryAlign, SupplementartyAlign) -F 256: 78,379,286 (Exclude SecondaryAlign) -f 3: 45,304,840 (Include ReadPair,ReadMapped) -f 8: 21,131,735 (Include MateUnmapped)

In conclusion, I can not exclude those "align concordantly >1 time", which I thought were considered secondary alignment; we can see surely that they are not because the SAMflags did not exclude them. So I can not find the right SAMflags to exclude them and get only what I want. I want to know the mapped concordantly >1 times generated if it's not the secondary alignment.

If anyone has experience with potential solutions to this issue, I would greatly appreciate your guidance.

ch4rr0 commented 2 months ago

Hello,

I pushed a few changes to the bug_fixes branch the should make this possible. Using a combination of the following flags should get you the desired result:

-M <n>: causes bowtie2 to add a YP:i opt flag for reads with more than n alignments --sam-opt-config: toggle any SAM Opt flags supported by bowtie2 (we're interested in turning on the YP:i flag in this case)

Here is an example of how this all comes together using the e_coli reference and sample reads from bowtie as an example:

./bowtie2 -x e_coli -1 ../bowtie/reads/e_coli_1000_1.fq -2 ../bowtie/reads/e_coli_1000_2.fq -M1 -S out.sam --sam-opt-config yp

Warning: -M is deprecated.  Use -D and -R to adjust effort instead.
1000 reads; of these:
  1000 (100.00%) were paired; of these:
    0 (0.00%) aligned concordantly 0 times
    967 (96.70%) aligned concordantly exactly 1 time
    33 (3.30%) aligned concordantly >1 times
    ----
    0 pairs aligned concordantly 0 times; of these:
      0 (0.00%) aligned discordantly 1 time
    ----
    0 pairs aligned 0 times concordantly or discordantly; of these:
      0 mates make up the pairs; of these:
        0 (0.00%) aligned 0 times
        0 (0.00%) aligned exactly 1 time
        0 (0.00%) aligned >1 times
100.00% overall alignment rate

We can see from the summary that there are 33 pairs (66 mates) that aligned concordantly more than one time. The aformentioned reads will have the YP:i:1 opt flag. We can use grep to confirm that there are indeed 66 mates with this flag set.

grep 'YP:i:1' out.sam | wc -l
    66

DISCLAIMER: this is a beta feature and maybe subject to change.

Let me know your thoughts.

Pompamm commented 2 months ago

Hello again,

Thank you so much for your response, which helped me a lot about this problem. To make sure that my understanding is correctly

This step that you provided means you force them to show reads that pairs are aligned more than n alignment (or more than 1 times)

But if I don't want those aligned >1 times (aligned concordantly >1 times and mates aligned >1 times) What should I do to keep only the pairs that aligned exactly, aligned discordantly, and aligned mates exactly one time for binning?

Do I have to tell bowtie2 to add a YM:i opt flag for mates (unpaired) with more than 1 alignments and collect results concatenated with -M 1 (YP:i of aligned concordantly exactly 1 time) from your answer together instead of using SAMflags? and grep -v 'YP:i:1' | grep -v 'YM:i:1' | grep 'YT:Z:DP' out.sam | wc -l Because I want to get rid of those >1 times, I need only exactly 1 time of paired mapped and mates mapped and discordantly 1 time. OR samtools view -F 2308 input.sam | grep -v 'YP:i:1 | grep -v 'YM:i:1' > filter_SAMPLE.sam which I don't know how to add a YM:i opt flag

I'm uncertain whether my thoughts are accurate. and I appreciate exploring more straightforward alternatives. Could you please provide guidance?

ch4rr0 commented 2 months ago

Hello,

Using grep YP:i:0 out.sam should give you all the other reads.

Pompamm commented 2 months ago

That might work for me! Thank you for helping me. Looking forward to the next version updates.