NVlabs / nvbio

NVBIO is a library of reusable components designed to accelerate bioinformatics applications using CUDA.
BSD 3-Clause "New" or "Revised" License
206 stars 50 forks source link

nvBowtie unmapped R2 replaced by reverse-compliment of R1 #47

Open tsp-kucbd opened 2 years ago

tsp-kucbd commented 2 years ago

While extracting the unmapped reads from a nvBowtie bam file, the unmapped R2 is always replaced by a reverse-compliment version of R1. This is very similar to an issue raised in 2015 https://groups.google.com/g/nvbio-users/c/is28EEvm2QE

Here is an example to reproduce the error. A tar file with minimal reference file and fq files is available at https://sid.erda.dk/share_redirect/hMPcixT4XT

wget "https://sid.erda.dk/share_redirect/hMPcixT4XT" -O dbg.tgz
tar xvzf dbg.tgz
cd dbg
nvBWT ref.fna ref.nvBWT.index
nvBowtie -1 test_1.fq -2 test_2.fq -S test_nvbio.bam -x ref.nvBWT.index

samtools view test_nvbio.bam|grep A00627:307:H22MGDSX3:1:1101:1325|cut -f10
cat test_*.fq|grep -A2 A00627:307:H22MGDSX3:1:1101:1325|grep AA

This results in

# R1 read from bam file
CAAATGTATCTCTCTCTCTCACACACAGTATCCAGATAACTGATTACTGGAATGTGTGATAGAATAATACTACTGCAGCCACGAATGGTATCTATTTGAAAAGTCTTCCTTGAATAGAAGTCTAATGCCGTCTACAGGATGTAGTAGATG
# R2 read from bam file, which is reverse_complement of R1
CATCTACTACATCCTGTAGACGGCATTAGACTTCTATTCAAGGAAGACTTTTCAAATAGATACCATTCGTGGCTGCAGTAGTATTATTCTATCACACATTCCAGTAATCAGTTATCTGGATACTGTGTGTGAGAGAGAGAGATACATTTG
# R1 read from test_1.fq
CAAATGTATCTCTCTCTCTCACACACAGTATCCAGATAACTGATTACTGGAATGTGTGATAGAATAATACTACTGCAGCCACGAATGGTATCTATTTGAAAAGTCTTCCTTGAATAGAAGTCTAATGCCGTCTACAGGATGTAGTAGATG
# R2 read from test_2.fq
AACTTAGAGTTCACTCTGTACAGATAGATAGATACAAGTTACCACAGAGATCATACTACATCTACTACATACTGTAAACGGCATTAGACTTCTATTCAAGGACGACGATTCAAATAGATACCATTCGTGGCTGCAGAACTATAATTCTAA