alexdobin / STAR

RNA-seq aligner
MIT License
1.78k stars 498 forks source link

FATAL ERROR in reads input: quality string length is not equal to sequence length #1992

Open bugandong opened 8 months ago

bugandong commented 8 months ago

Hi The STAR version I used is 2.7.11a and the code is

STAR --genomeDir $REF_DIR \
    --readFilesIn $OUT_DIR/${READS}_1_trimmed.fq $OUT_DIR/${READS}_2_trimmed.fq \
    --outSAMtype BAM SortedByCoordinate \
    --outFileNamePrefix $OUT_DIR/STAR_output \
    --runThreadN 8

It gave me an error image Then I checked the fastq files both the sequence length and quality score length of the two files in this ID are the same. image then I tried to delete this reads in two files and run the STAR again It gave me an similar error image I checked this id again, it still have no problem, the sequence length and quality score length are the same.

and this is the script

#!/usr/bin/bash

# xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
cd /home1/chenxiyang/Project_scChIP/liuyvxin/
READS="SRR21047109"  
ZNF318_DIR="/home1/chenxiyang/Project_scChIP/liuyvxin/ZNF318"
READS_DIR="$ZNF318_DIR/reads"
REF_DIR="/home1/chenxiyang/Project_scChIP/liuyvxin/ref"
OUT_DIR="$ZNF318_DIR/output"

mkdir -p "$ZNF318_DIR"
mkdir -p "$READS_DIR"
mkdir -p "$OUT_DIR"

prefetch "$READS" 
mv "/home1/chenxiyang/Project_scChIP/liuyvxin/$READS/$READS.sra" "$READS_DIR"
if [ ! -f "$READS_DIR/$READS.sra" ]; then
    echo "测序数据不存在于预期的目录下"
    exit 1
fi

fastq-dump "$READS_DIR/$READS.sra" --split-3 --outdir "$OUT_DIR" 
if [ $? -ne 0 ]; then
    echo "fastq-dump failed"
    exit 1
fi

fastqc "$OUT_DIR/${READS}_1.fastq" -o "$OUT_DIR"
fastqc "$OUT_DIR/${READS}_2.fastq" -o "$OUT_DIR"
# 过滤
trim_galore --quality 20 --Illumina --length 20 -o "$OUT_DIR" "$OUT_DIR/${READS}_1.fastq" "$OUT_DIR/${READS}_2.fastq"

if [ $? -ne 0 ]; then
    echo "fastqc failed"
    exit 1
fi

STAR --runMode genomeGenerate \
    --genomeDir $REF_DIR \
    --genomeFastaFiles $REF_DIR/GRCh38.p14.genome.fa \
    --sjdbGTFfile $REF_DIR/gencode.v44.annotation.gtf \
    --sjdbOverhang 99 \
    --runThreadN 8 \
    --limitGenomeGenerateRAM 540000000000

if [ $? -ne 0 ]; then
    echo "STAR-index failed"
    exit 1
fi

cd "$OUT_DIR"

STAR --genomeDir $REF_DIR \
    --readFilesIn $OUT_DIR/${READS}_1_trimmed.fq $OUT_DIR/${READS}_2_trimmed.fq \
    --outSAMtype BAM SortedByCoordinate \
    --outFileNamePrefix $OUT_DIR/STAR_output \
    --runThreadN 8
featureCounts -T 8 \
        -a $REF_DIR/gencode.v44.annotation.gtf \
        -o $OUT_DIR/G-counts.txt \
        -g gene_id \
        -p \
        $OUT_DIR/STAR_outputAligned.out.bam
grep -E "ENSG00000171467" $OUT_DIR/G-counts.txt > $OUT_DIR/G-filtered_counts.txt
grep -E "ENSG00000171467.16" $OUT_DIR/G-counts.txt > $OUT_DIR/G-filtered_counts.16.txt

featureCounts -T 8 \
        -a $REF_DIR/gencode.v44.annotation.gtf \
        -o $OUT_DIR/T-counts.txt \
        -g transcript_id \
        -p \
        $OUT_DIR/STAR_outputAligned.out.bam
grep -E "ENST00000361428.3|ENST00000606599.1|ENST00000605935.5|ENST00000607252.5" $OUT_DIR/T-counts.txt > $OUT_DIR/T-filtered_counts.txt

featureCounts -T 8 \
        -a $REF_DIR/gencode.v44.annotation.gtf \
        -o $OUT_DIR/E-counts.txt \
        -g exon_id \
        -p \
        $OUT_DIR/STAR_outputAligned.out.bam
grep -E "ENSE00001612696.2|ENSE00001681665.1|ENSE00001137783.1|ENSE00001173021.1|ENSE00001173011.1|ENSE00001173003.1|ENSE00001172995.1|ENSE00001172987.1|ENSE00001172979.1|ENSE00001137777.4" $OUT_DIR/E-counts.txt > $OUT_DIR/E-filtered_counts.txt
echo "Pipeline completed successfully!"

Is this a software compatibility issue? But I have run STAR before and the same error has occurred. After searching, I found that there is indeed a problem with the reads reporting the error. After I delete the problematic sequence, I can continue to run STAR. I don’t know why this time. How can I solve this error? Thank you!

alexdobin commented 7 months ago

Hi @bugandong

This is the problem with FASTQ file formatting. You need to look at the read that is mentioned in the error message, in both files.

ylve commented 1 week ago

Hello! I am currently using the Star version star/2.7.11a and also get the sequence length error. EXITING because of FATAL ERROR in reads input: quality string length is not equal to sequence length @LH00179:46:22HWMNLT3:8:1107:51049:1208 + AAAAAAAAAAAAAAAAACCCCCCC SOLUTION: fix your fastq file

I'm quite new to this analysis and tried to use the untrimmed files and now the .zip format is running, but I dont have high hope that this is working. I am using the following script (all in one line):

STAR --runThreadN 16 --genomeDir /pathway /index

--readFilesCommand unzip -p

--readFilesIn /pathway/R1_001_fastqc.zip /pathway/R2_001_fastqc.zip

--outSAMtype BAM SortedByCoordinate --outFileNamePrefix aligned2/GiWB-wt --outSAMunmapped Within --outSAMattributes Standard --outFilterMultimapNmax 1 --outFilterMismatchNmax 999 --outFilterMismatchNoverLmax 0.05

I also tried to delete everything out of the directory and this is the only script I am running. Also I dont get any response for the grep command for the reverse sequence (?)

Screenshot 2024-07-05 at 12 01 16

Thank you soo much in advance!!