Error: reads file does not look like a FASTQ file

amitjavilaventura commented 2 years ago

Dear Dr. Langmead,

I am trying to use Bowtie in a pipeline for small RNA-seq. I have been using it for months, but now, using the same command, it throws an error telling that the "read file does not look like a FASTQ file":

Time loading forward index: 00:00:09
Time loading mirror index: 00:00:09
Error: reads file does not look like a FASTQ file
terminate called after throwing an instance of 'int'

The command is this one:

bowtie -v 1  -M 1  --seed 666 --best --strata  --quiet  --threads 8 --chunkmbs 1024 --time --sam <refGenomeIndexPrefix> <fastq.gz>

The only difference between now and before is that now I am running this in a cluster using a singularity image and before I was running bowtie locally using conda. The previous steps in the pipeline are adapter trimming with cutadapt and quality filtering with fastq_quality_filter.

I was looking at the FASTQ.gz files and they look normal:

@7001450:617:CD9F5ANXX:4:2309:3398:1995 1:N:0:TGACCA
TCTCAGNTTGTCATTTGGAGACTCCCCA
+
BBBCCE#>?FGGGGGGGGGGGGGGGGGG
@7001450:617:CD9F5ANXX:4:2309:3690:1999 1:N:0:TGACCA
TGAACGGAGAATAGAGTACATTGAAGCGA
+
CBBBBGGGGGGGGGGGGGEEGGGGGGGGC

I used three different approaches to validate them:

Looking at the sequence string and the quality string and counting the number of cases in those 2 strings are different in length (0 cases where the sequence and the quality strings are different).

Using fastq_info from fastq_utils:


fastq_utils 0.25.1
DEFAULT_HASHSIZE=39000001
Scanning and indexing all reads from results/01_fastq/caroli1.filt.fastq.gz
CASAVA=1.8
43600000Scanning complete.

Reads processed: 43600732 Memory used in indexing: ~3346 MB

Number of reads: 43600732 Quality encoding range: 35 71 Quality encoding: 33 Read length: 19 36 30 OK


- Using `validatefastq` from [`biopet`](https://github.com/biopet/validatefastq):

INFO [2022-02-07 17:18:52,605] [ValidateFastq$] - Start INFO [2022-02-07 17:18:52,969] [ValidateFastq$$anonfun$main$1] - 100000 reads processed INFO [2022-02-07 17:18:53,156] [ValidateFastq$$anonfun$main$1] - 200000 reads processed ... ... INFO [2022-02-07 17:20:16,953] [ValidateFastq$$anonfun$main$1] - 43600000 reads processed INFO [2022-02-07 17:20:16,955] [ValidateFastq$] - Possible quality encodings found: Sanger, Illumina 1.8+ INFO [2022-02-07 17:20:16,955] [ValidateFastq$] - Done processing 43600732 fastq records, no errors found INFO [2022-02-07 17:20:16,956] [ValidateFastq$] - Done



Non of the approaches resulted in a "unvalid" FASTQ. 

Why can this happen? 

Thank you.

Best regards,
Adrià.

ch4rr0 commented 2 years ago

Hello,

What version of bowtie are you using on the server? It's possible that that version does not support compressed (GZIP) input.

amitjavilaventura commented 2 years ago

Hi,

Thanks for the response. I have just noticed that the versions are different.

The version used in the singulairity image from the cluster is:

/opt/miniconda3/bin/bowtie version 1.0.0
64-bit
Built on 4d87110594ec
Wed Mar 23 19:06:59 UTC 2016
Compiler: gcc version 4.8.2 20140120 (Red Hat 4.8.2-15) (GCC) 
Options: -O3 -m64  -Wl,--hash-style=both  
Sizeof {int, long, long long, void*, size_t, off_t}: {4, 8, 8, 8, 8, 8}

And the version I have locally in the conda environment is:

bowtie-align version 1.2
64-bit
Built on testing-gce-ab28e1d1-a823-4ae9-9c55-f53e1e445058
Sat May  6 18:08:00 UTC 2017
Compiler: gcc version 4.8.5 (GCC) 
Options: -O3 -m64 -I/home/amitjavila/anaconda3/envs/smallRNA/include -L/home/amitjavila/anaconda3/envs/smallRNA/lib -Wl,--hash-style=both -DWITH_TBB -DPOPCNT_CAPABILITY -DNO_SPINLOCK -DWITH_QUEUELOCK=1  
Sizeof {int, long, long long, void*, size_t, off_t}: {4, 8, 8, 8, 8, 8}

I understand that the version 1.0.0 does not allow for .gz compression.

Thanks.

Adrià.

BenLangmead / bowtie

Error: reads file does not look like a FASTQ file #129

Reads processed: 43600732 Memory used in indexing: ~3346 MB