BenLangmead / bowtie

An ultrafast memory-efficient short read aligner
Other
260 stars 77 forks source link

Re-open issue: invalid fastq files produced using --un #8 #88

Open elenichri opened 5 years ago

elenichri commented 5 years ago

Hello, I re-open this issue...I am mapping paired-end reads using bowtie2 and the --un option; therefore I retrieve two output fastq files, one for each paired-end read. I then use star aligner to map these fastq files to the human genome. Star stars running but I get the error _ReadAlignChunkprocessChunks.cpp:115:processChunks EXITING because of FATAL ERROR in input reads: unknown file format: the read ID should start with @ or >

I ran fastQValidator program to check if the fastq files that bowtie2 returns are valid.(https://genome.sph.umich.edu/wiki/FastQValidator) ./fastQValidator --file xxx.trimmed.2.fastq Here is the output:

ERROR on Line 10414: Invalid character ('J') in base sequence. ERROR on Line 10414: Invalid character ('J') in base sequence. ERROR on Line 10414: Invalid character ('J') in base sequence. ERROR on Line 10414: Invalid character ('J') in base sequence. ERROR on Line 10414: Invalid character ('J') in base sequence. ERROR on Line 10414: Invalid character ('J') in base sequence. ERROR on Line 10414: Invalid character ('J') in base sequence. ERROR on Line 10414: Invalid character ('J') in base sequence. ERROR on Line 10414: Invalid character ('J') in base sequence. ERROR on Line 10414: Invalid character ('J') in base sequence. ERROR on Line 10414: Invalid character ('J') in base sequence. ERROR on Line 10414: Invalid character ('J') in base sequence. ERROR on Line 10414: Invalid character ('J') in base sequence. ERROR on Line 10414: Invalid character ('J') in base sequence. ERROR on Line 10414: Invalid character ('J') in base sequence. ERROR on Line 10414: Invalid character ('J') in base sequence. ERROR on Line 10414: Invalid character ('J') in base sequence. ERROR on Line 10414: Invalid character ('J') in base sequence. ERROR on Line 10414: Invalid character ('J') in base sequence. ERROR on Line 10414: Invalid character ('J') in base sequence. Finished processing xxx.trimmed.2.fastq with 90418286 lines containing 22604486 sequences. There were a total of 12073 errors. Returning: 1 : FASTQ_INVALID

So, it seems that bowtie2 generates invalid fastq files in my case. Do you have any idea on how I can fix this problem? My inputs (var2 and var3) are trimmed fastq files but I wouldn't like to use the non-trimmed fastq files. I use 8 cores for running bowtie2 on 12 samples. My run command is bowtie2 --dovetail --no-discordant -I 20 -p 8 -x _my reference sequence_ --un-conc "$var1" -1 "$var2" -2 "$var3" -S "$var4" where var.i is taken from a parameters file

Thank you very much in advance! Eleni

mschilli87 commented 5 years ago

original issue: https://github.com/BenLangmead/bowtie/issues/8

ch4rr0 commented 5 years ago

How often does this happen? Every run, or sporadically? I am asking because I am trying to figure out whether this is a multi-threaded related issue or the wrapper script just not processing "trimmed" input correctly.

elenichri commented 5 years ago

Dear ch4rr0, thank you for your reply. It happens for all the fastq files of one dataset with 12 samples. All 12 fastq files are invalid. I run my bowtie2 command in multithread (12 threads) but I don't think that this is an issue; the exact same command, using threads, works perfectly fine for another dataset. I am certain that the 'trimmed.fastq' input files are correct because I have also mapped them with star and I had no problem at all.

ch4rr0 commented 5 years ago

I am looking into this one. I will update the thread if and when I am to recreate the issue.