BenLangmead / bowtie2

A fast and sensitive gapped read aligner
GNU General Public License v3.0
664 stars 158 forks source link

Output only unmapped read of pair to --un-mates/conc, when only one maps #417

Open ohickl opened 1 year ago

ohickl commented 1 year ago

Hi I was wondering if it was possible to write only the unmapped read of a read pair to the unmapped output. It would be fine for me if the pairs map discordantly or just one does. At the moment I would 'lose' the entire pair with --un-conc, whereas --un-mates from #254 doesn't seem to work:

Not a HASH reference at .../bowtie2/bin/bowtie2 line 614, <BT> line 134.

This is my approach for fw and rv, respectively, atm:

bowtie2 -q \
        --time \
        --un-conc ${un_path}/r<1,2>_unmapped.fastq.gz \
        --threads ${threads} \
        -x ${out}/bt2_index \
        --sam-no-qname-trunc \
        --no-unal \
        <--norc,--nofw> \
        -1 ${fw} \
        -2 ${rv} > /dev/null 2>> ${log}

Or would it work fine doing something like, e.g.:

for i in {fw,rv,se1,se2}; do
  echo "Getting ${i} unaligned reads." >> ${log} 2>&1
  bowtie2 -q \
          --time \
          --un ${un_path}/${i}_unmapped.fastq.gz \
          --threads ${threads} \
          -x ${out}/bt2_index \
          --sam-no-qname-trunc \
          --no-unal \
          -U ${!i} > /dev/null 2>> ${log}
done

Not sure if this would create any problems, since the pair information is lost.

Best

Oskar

ch4rr0 commented 1 year ago

Hello,

I pushed an issue to bug_fixes branch that should address this. Let me know if the changes works for you. Here's how I tested:

$ cat r1.fq
@r1
TGAATGCGAACTCCGGGACGCTCAGTAATGTGACGATAGCTGAAAACTGTACGATAAACNGTACGCTGAGGGCAGAAAAAATCGTCGGGGACATTNTAAAGGCGGCGAGCGCGGCTTTTCCG
+
+"@6<:27(F&5)9)"B:%B+A-%5A?2$HCB0B+0=D<7E/<.03#!.F77@6B==?C"7>;))%;,3-$.A06+<-1/@@?,26">=?*@'0;$:;??G+:#+(A?9+10!8!?()?7C>

$ cat r2.fq
@r1
GAATACTGGCGGATTACCGGGGAAGCTGGAGC
+
EDCCCBAAAA@@@@?>===<;;9:99987776

$ ./bowtie2 -x example/index/lambda_virus -1 r1.fq -2 r2.fq --un-mates out
1 reads; of these:
  1 (100.00%) were paired; of these:
    1 (100.00%) aligned concordantly 0 times
    0 (0.00%) aligned concordantly exactly 1 time
    0 (0.00%) aligned concordantly >1 times
    ----
    1 pairs aligned concordantly 0 times; of these:
      0 (0.00%) aligned discordantly 1 time
    ----
    1 pairs aligned 0 times concordantly or discordantly; of these:
      2 mates make up the pairs; of these:
        1 (50.00%) aligned 0 times
        1 (50.00%) aligned exactly 1 time
        0 (0.00%) aligned >1 times
50.00% overall alignment rate
@HD VN:1.5  SO:unsorted GO:query
@SQ SN:gi|9626243|ref|NC_001416.1|  LN:48502
@PG ID:bowtie2  PN:bowtie2  VN:2.5.0    CL:"/bowtie2/bowtie2-align-s --wrapper basic-0 -x example/index/lambda_virus --passthrough -1 r1.fq -2 r2.fq"
r1  73  gi|9626243|ref|NC_001416.1| 18401   42  122M    =   18401   0   TGAATGCGAACTCCGGGACGCTCAGTAATGTGACGATAGCTGAAAACTGTACGATAAACNGTACGCTGAGGGCAGAAAAAATCGTCGGGGACATTNTAAAGGCGGCGAGCGCGGCTTTTCCG  +"@6<:27(F&5)9)"B:%B+A-%5A?2$HCB0B+0=D<7E/<.03#!.F77@6B==?C"7>;))%;,3-$.A06+<-1/@@?,26">=?*@'0;$:;??G+:#+(A?9+10!8!?()?7C>  AS:i:-5 XN:i:0  XM:i:3  XO:i:0  XG:i:0  NM:i:3  MD:Z:59G13G21G26    YT:Z:UP
r0  133 gi|9626243|ref|NC_001416.1| 18401   0   *   =   18401   0   GAATACTGGCGGATTACCGGGGAAGCTGGAGC    EDCCCBAAAA@@@@?>===<;;9:99987776    YT:Z:UP

$ samtools flags 73
0x49    73      PAIRED,MUNMAP,READ1

$ samtools flags 133
0x85    133     PAIRED,UNMAP,READ2

$ cat out.1

$ cat out.2
@r1
GAATACTGGCGGATTACCGGGGAAGCTGGAGC
+
EDCCCBAAAA@@@@?>===<;;9:99987776