alexdobin / STAR

RNA-seq aligner
MIT License
1.87k stars 506 forks source link

[2.7.10b] Read length in FATAL ERROR message does not match actual read length #2200

Open sbresnahan opened 3 months ago

sbresnahan commented 3 months ago

STAR (2.7.10b, compiled from source) exits with message:

EXITING because of FATAL ERROR in reads input: Lread>=18446744073709551615   while DEF_readSeqLengthMax=650
Read Name=@UNC11-SN627:348:C3KRYACXX:7:2211:11331:18281
SOLUTION: increase DEF_readSeqLengthMax in IncludeDefine.h and re-compile STAR

However, the read in question looks like:

@UNC11-SN627:348:C3KRYACXX:7:2211:11331:18281
GCGTCAGGGACTTCACCATCCCCACCACAGAGAAGCTGGCCTTGGTCC
+
@@@DDDFFHHHHHGGJIIJJJIJJJJJJJJJJJICGHJIJIJJIIBHH

Unclear how to proceed with debugging this issue.

sbresnahan commented 3 months ago

Note: Rread is the same length.

sbresnahan commented 3 months ago

And - here are the hex dumps for reads before, including, and after the read in question:

## This read + following 
gunzip -c 52a9ef0c-dd29-4449-9a85-00f13d56dbab.rna_seq.transcriptome.gdc_realn.bam_R1_001.fastq.gz | grep -A 7 "@UNC11-SN627:348:C3KRYACXX:7:2211:11331:18281" -

@UNC11-SN627:348:C3KRYACXX:7:2211:11331:18281
GCGTCAGGGACTTCACCATCCCCACCACAGAGAAGCTGGCCTTGGTCC
+
@@@DDDFFHHHHHGGJIIJJJIJJJJJJJJJJJICGHJIJIJJIIBHH
@UNC11-SN627:348:C3KRYACXX:7:1213:11798:37426
CGCGTCAGGGACTTCACCATCCCCACCACAGAGAAGCTGGCCTTGGTC
+
B@CFDFFFHHHHHJJJJJIJIJJJGIJJJIIJIIJIIJIJJJJJJJHJ

gunzip -c 52a9ef0c-dd29-4449-9a85-00f13d56dbab.rna_seq.transcriptome.gdc_realn.bam_R1_001.fastq.gz | grep -A 7 "@UNC11-SN627:348:C3KRYACXX:7:2211:11331:18281" - | hexdump -c

0000000   @   U   N   C   1   1   -   S   N   6   2   7   :   3   4   8
0000010   :   C   3   K   R   Y   A   C   X   X   :   7   :   2   2   1
0000020   1   :   1   1   3   3   1   :   1   8   2   8   1  \n   G   C
0000030   G   T   C   A   G   G   G   A   C   T   T   C   A   C   C   A
0000040   T   C   C   C   C   A   C   C   A   C   A   G   A   G   A   A
0000050   G   C   T   G   G   C   C   T   T   G   G   T   C   C  \n   +
0000060  \n   @   @   @   D   D   D   F   F   H   H   H   H   H   G   G
0000070   J   I   I   J   J   J   I   J   J   J   J   J   J   J   J   J
0000080   J   J   I   C   G   H   J   I   J   I   J   J   I   I   B   H
0000090   H  \n   @   U   N   C   1   1   -   S   N   6   2   7   :   3
00000a0   4   8   :   C   3   K   R   Y   A   C   X   X   :   7   :   1
00000b0   2   1   3   :   1   1   7   9   8   :   3   7   4   2   6  \n
00000c0   C   G   C   G   T   C   A   G   G   G   A   C   T   T   C   A
00000d0   C   C   A   T   C   C   C   C   A   C   C   A   C   A   G   A
00000e0   G   A   A   G   C   T   G   G   C   C   T   T   G   G   T   C
00000f0  \n   +  \n   B   @   C   F   D   F   F   F   H   H   H   H   H
0000100   J   J   J   J   J   I   J   I   J   J   J   G   I   J   J   J
0000110   I   I   J   I   I   J   I   I   J   I   J   J   J   J   J   J
0000120   J   H   J  \n                                                
0000124

## Previous read
gunzip -c 52a9ef0c-dd29-4449-9a85-00f13d56dbab.rna_seq.transcriptome.gdc_realn.bam_R1_001.fastq.gz | grep -B 4 "@UNC11-SN627:348:C3KRYACXX:7:2211:11331:18281" - 

@UNC11-SN627:348:C3KRYACXX:7:2112:3200:10717
GGACTTCACCATCCCCACCACAGAGAAGCTGGCCTTGGTCCACCAGCG
+
??<ADD?:D<CDDDA1CE)CEDEABEI3?@B?;@DC9D?DDEECDDDA

gunzip -c 52a9ef0c-dd29-4449-9a85-00f13d56dbab.rna_seq.transcriptome.gdc_realn.bam_R1_001.fastq.gz | grep -B 4 "@UNC11-SN627:348:C3KRYACXX:7:2211:11331:18281" - | hexdump -c

0000000   @   U   N   C   1   1   -   S   N   6   2   7   :   3   4   8
0000010   :   C   3   K   R   Y   A   C   X   X   :   7   :   2   1   1
0000020   2   :   3   2   0   0   :   1   0   7   1   7  \n   G   G   A
0000030   C   T   T   C   A   C   C   A   T   C   C   C   C   A   C   C
0000040   A   C   A   G   A   G   A   A   G   C   T   G   G   C   C   T
0000050   T   G   G   T   C   C   A   C   C   A   G   C   G  \n   +  \n
0000060   ?   ?   <   A   D   D   ?   :   D   <   C   D   D   D   A   1
0000070   C   E   )   C   E   D   E   A   B   E   I   3   ?   @   B   ?
0000080   ;   @   D   C   9   D   ?   D   D   E   E   C   D   D   D   A
0000090  \n   @   U   N   C   1   1   -   S   N   6   2   7   :   3   4
00000a0   8   :   C   3   K   R   Y   A   C   X   X   :   7   :   2   2
00000b0   1   1   :   1   1   3   3   1   :   1   8   2   8   1  \n    
00000bf

They seem properly formatted to me.