alexdobin / STAR

RNA-seq aligner
MIT License
1.86k stars 506 forks source link

EXITING because of FATAL ERROR in reads input: short read sequence line: 0 #979

Closed aarbduarte closed 4 years ago

aarbduarte commented 4 years ago

Hi,

Thank you for all the effort, STAR is a fantastic tool.

I've been running STAR for 92 paired-end RNA-seq samples and at a particular one I always retrieve the same error:

Jul 27 01:00:28 ..... started STAR run
Jul 27 01:00:28 ..... loading genome
Jul 27 01:00:49 ..... started 1st pass mapping

EXITING because of FATAL ERROR in reads input: short read sequence line: 0
Read Name=@SRR1070260.47317756.1
Read Sequence====
DEF_readNameLengthMax=50000
DEF_readSeqLengthMax=650

Jul 27 10:49:24 ...... FATAL ERROR, exiting

I've tried to run it gzipped and unzipped. The command I use is $STAR --runThreadN 60 --genomeDir /home/aduarte/STAR_refe_files --readFilesIn ./bams/SRR1070260_1.fastq ./bams/SRR1070260_2.fastq --twopassMode Basic --outSAMstrandField intronMotif --outSAMtype BAM Unsorted --outFileNamePrefix ./bams/SRR1070260.

(with --readFilesCommand zcat when dealing with .gz files)

the lines in question are:

@SRR1070260.47317755.1 47317755 length=75
GCCTACTACTTCGAGAGGGACATCAAGGGCGAGTCTCTATTCCAGGGCCGCGGCGGCCTGGACTTGCGCGTGCGC
+
=<=A7<A7<<A<<+22773)<A=7ABA3)1?80=A49?BB?7/)8=:A;A#########################
@SRR1070260.47317756.1 47317756 length=75
ATTTTACCAATACCTAACCATGTTTACCAGAATGGCGGATTGCTTCTTCCAGCGTGGAGTGGCCAACTATCAGCT
+
###########################################################################
@SRR1070260.47317757.1 47317757 length=75
CTTGGCGGGAAGCTTCCCCATTAAACACAAAAGAAAAAAAAAAATAAAAAACTACAAACAAACCGGCATCACGCA
+
??+=;DDDD0C6;DD9?DD)9D?D?8BB=CCD;5@D)=AD@?A'+8AA###########################

and

@SRR1070260.47317755.2 47317755 length=75
TGCCCCGCCGGGGACCCACAAGCTCGGATCCTTTCTCAACTCCCCCAGTTCCTTGATCTCCACCTTCTTGTACTT
+
+1:<?A0?A70))00?8=A;(=BBA0''(57=777)===>@>>@@6'5;=??#######################
@SRR1070260.47317756.2 47317756 length=75
GGGTAAGAGGAGGAAGTGGATGAAGAGGTTGGAGGAGGAGGAACAGGCCTGGAATCCGTATGAATACAAAGGGCC
+
###########################################################################
@SRR1070260.47317757.2 47317757 length=75
CCGGGCTGTGCGTGATGCCGGGCTGTGCGTGATGCCGGGCTGGGGGTGATGCCGGGCTGTGCGTGATGCCGGGCT
+
@@@DDDDDHFFHDGEHBGIIHG8FFDGG<F@DHIIIEG<8BH6;/<,8<CCCA2-78B#################

Any idea?

Thank you for your time

alexdobin commented 4 years ago

Hi @aarbduarte

there is another similar problem #975 . The reads look fine, so it's puzzling. You got the fastq files from SRA, without any postprocessing? The only thing I can think of is that there is some sort of invisible character that creates an empty right after @SRR1070260.47317756.1

Can you extract these 3+3 reads to files (not copypaste from the screen), and map them?

Cheers Alex

aarbduarte commented 4 years ago

I used cutadapt to trim the last base of each read prior to alignment.

using grep -A 7 -B 8 '@SRR1070260.47317756.1/2' SRR1070260_1/2.fastq > test1/2.fastq

I retrieved the files below:

test2.txt test1.txt

alexdobin commented 4 years ago

Hi @aarbduarte

these files map without any problem. So you see the error if you map them? If not, the next test I would recommend is mapping the original files, without trimming - this will test whether the trimming somehow screwed up with fastq files.

Cheers Alex

aarbduarte commented 4 years ago

Hi @alexdobin , thank you for the quick reply.

when running pre-cutadapt files it delivers the same error.

`$STAR --runThreadN 60 \

--genomeDir /home/aduarte/STAR_refe_files \ --readFilesIn ./bams/SRR1070260_1.fastq.gz ./bams/SRR1070260_2.fastq.gz \ --twopassMode Basic \ --outSAMstrandField intronMotif \ --readFilesCommand zcat \ --outSAMtype BAM Unsorted \ --outFileNamePrefix ./bams/SRR1070260. Aug 04 15:14:28 ..... started STAR run Aug 04 15:14:28 ..... loading genome Aug 04 15:16:24 ..... started 1st pass mapping

EXITING because of FATAL ERROR in reads input: short read sequence line: 0 Read Name=@SRR1070260.47317756.1 Read Sequence==== DEF_readNameLengthMax=50000 DEF_readSeqLengthMax=650

Aug 04 15:19:53 ...... FATAL ERROR, exiting

gzip: stdout: Broken pipe`

Thank you for your time

alexdobin commented 4 years ago

Hi @aarbduarte

weird, not sure what's going on. Which STAR version you are using? Please try the latest 2.7.5b , both the static and dynamic executables, and compiled from source. I will download this sample to see if I can reproduce the problem.

Cheers Alex

alexdobin commented 4 years ago

Aah, I cannot get this sample, it's dbGapped. So you would need to try different things on your side. A few more suggestions:

  1. Run it fewer threads, say 10 or 20. There is no usually gain in mapping speed above 20-30 threads.
  2. Unzip the fastq files.
aarbduarte commented 4 years ago

Rerun using recommended settings, still delivers the same error

alexdobin commented 4 years ago

Hi @aarbduarte

this is going to be a hard one to debug... :( A couple of more things to try:

  1. Map the few reads in test1.txt, test2.txt files that you sent me. When I run them, there was no problem.
  2. Remove the offending read from the entire fastqs, e.g. with
    grep -A3 -v ^@SRR1070260.47317756.1 fastq1
    grep -A3 -v ^@SRR1070260.47317756.2 fastq2

Cheers Alex

aarbduarte commented 4 years ago

I isolated the reads and they were mapped correctly. Weird issue but solved. Thanks!