arq5x / bedtools2

bedtools - the swiss army knife for genome arithmetic
MIT License
922 stars 287 forks source link

bamToFastq v2.31.0 records repeated #1058

Open brianjohnhaas opened 1 year ago

brianjohnhaas commented 1 year ago

Hi,

I was converting a pbio ubam file to fastq like so:

bamToFastq -i input.bam -fq test.fastq

and noticed that all records are repeated twice in the fastq.

for example:

shows twice in fastq

cat test.fastq | grep -n m64386e_230530_201532/77006631/ccs 1641865:@m64386e_230530_201532/77006631/ccs 5538629:@m64386e_230530_201532/77006631/ccs

shows once in the ubam

samtools view input.bam | grep -n m64386e_230530_201532/77006631/ccs 410467:m64386e_230530_201532/77006631/ccs 4 0 255 * 0 0 (myseq) (myquals) (rest) cant share sequence data here.

mbeavitt-bh commented 8 months ago

Same error over here for me. Right now bedtools bam2fastq is not suitable for converting pacbio .bam files, you need to use pbtk bam2fastq instead.

Thomieh73 commented 6 months ago

I am using version, v2.31.1 and I encountered this error two. I only noticed it when I tried to do an assembly and reads were found to be duplicated.

brianjohnhaas commented 6 months ago

I've resorted to using 'samtools fastq' instead.