alexdobin / STAR

RNA-seq aligner
MIT License
1.84k stars 505 forks source link

STAR_2.7.2a --> ReadAlignChunk_processChunks.cpp:171:processChunks EXITING because of FATAL ERROR in input reads: unknown file format: the read ID should start with @ or > #1043

Closed tinu-t closed 4 years ago

tinu-t commented 4 years ago

Trying to run STAR 2.7.2a to get the file output which is Chimeric.out.junction which is to be used by the STAR-Fusion program. I am using multi-lane fastq files like sampleA_L001_R1.fastq.gz, sampleA_L002_R1.fastq.gz & sample_L001_R2.fastq.gz, sampleA_L002_R2.fastq.gz

Command executed:

  STAR --genomeDir STAR_2.7.2a_genome_index \
      --readFilesIn sampleA_L001_R1.fastq.gz,sampleA_L002_R1.fastq.gz sampleA_L001_R2.fastq.gz,sampleA_L002_R2.fastq.gz
      --runThreadN 8 \
      --outReadsUnmapped None \
      --twopassMode Basic \
      --readFilesCommand zcat \
      --outSAMstrandField intronMotif \
      --outSAMunmapped Within \
      --chimSegmentMin 12 \  # ** essential to invoke chimeric read detection & reporting **
      --chimJunctionOverhangMin 8 \
      --chimOutJunctionFormat 1 \   # **essential** includes required metadata in Chimeric.junction.out file.
      --alignSJDBoverhangMin 10 \
      --alignMatesGapMax 100000 \   # avoid readthru fusions within 100k
      --alignIntronMax 100000 \
      --alignSJstitchMismatchNmax 5 -1 5 5 \   # settings improved certain chimera detections
      --outSAMattrRGline ID:GRPundef \
      --chimMultimapScoreRange 3 \
      --chimScoreJunctionNonGTAG -4 \
      --chimMultimapNmax 20 \
      --chimNonchimScoreDropMin 10 \
      --peOverlapNbasesMin 12 \
      --peOverlapMMp 0.1 \
      --alignInsertionFlush Right \
      --alignSplicedMateMapLminOverLmate 0 \
      --alignSplicedMateMapLmin 30

Command exit status:
  104

Command output:
  Sep 28 11:35:53 ..... started STAR run
  Sep 28 11:35:53 ..... loading genome
  Sep 28 11:36:32 ..... started mapping

Command error:

  ReadAlignChunk_processChunks.cpp:171:processChunks EXITING because of FATAL ERROR in input reads: unknown file format: the read ID should start with @ or > 

  Sep 28 11:36:33 ...... FATAL ERROR, exiting
alexdobin commented 4 years ago

Hi Thomas

please post the Log.out file.

Cheers Alex

tinu-t commented 4 years ago

Here is the output from Log.out Looks like it is only taking fastq files from the first lane L001 and not considering the lane L002

##### Command Line:
STAR --genomeDir STAR_2.7.2a_genome_index --readFilesIn sampleA_L001_R1.fastq.gz sampleA_L001_R2.fastq.gz --readMapNumber 1
##### Initial USER parameters from Command Line:
###### All USER parameters from Command Line:
genomeDir                     STAR_2.7.2a_genome_index     ~RE-DEFINED
readFilesIn                   sampleA_L001_R1.fastq.gz   sampleA_L001_R2.fastq.gz        ~RE-DEFINED
readMapNumber                 1     ~RE-DEFINED
##### Finished reading parameters from all sources

##### Final user re-defined parameters-----------------:
genomeDir                         STAR_2.7.2a_genome_index
readFilesIn                       sampleA_L001_R1.fastq.gz   sampleA_L001_R2.fastq.gz   
readMapNumber                     1

-------------------------------
##### Final effective command line:
STAR   --genomeDir STAR_2.7.2a_genome_index   --readFilesIn sampleA_L001_R1.fastq.gz   sampleA_L001_R2.fastq.gz      --readMapNumber 1
----------------------------------------

Finished loading and checking parameters
Reading genome generation parameters:
### STAR   --runMode genomeGenerate   --runThreadN 3   --genomeDir /cluster/projects/tmp_STAR/STAR_2.7.2a_genome_index   --genomeFastaFiles /cluster/projects/GTEx/genome/hg19_hs37d5/genome.fa      --sjdbGTFfile /cluster/projects/GTEx/annotation/gencode.v19.annotation.hs37d5_chr.coding.spladder.gtf
### GstrandBit=32
versionGenome                 2.7.1a     ~RE-DEFINED
genomeFastaFiles              /cluster/projects/GTEx/genome/hg19_hs37d5/genome.fa        ~RE-DEFINED
genomeSAindexNbases           14     ~RE-DEFINED
genomeChrBinNbits             18     ~RE-DEFINED
genomeSAsparseD               1     ~RE-DEFINED
sjdbOverhang                  100     ~RE-DEFINED
sjdbFileChrStartEnd           -        ~RE-DEFINED
sjdbGTFfile                   /cluster/projects/GTEx/annotation/gencode.v19.annotation.hs37d5_chr.coding.spladder.gtf     ~RE-DEFINED
sjdbGTFchrPrefix              -     ~RE-DEFINED
sjdbGTFfeatureExon            exon     ~RE-DEFINED
sjdbGTFtagExonParentTranscripttranscript_id     ~RE-DEFINED
sjdbGTFtagExonParentGene      gene_id     ~RE-DEFINED
sjdbInsertSave                Basic     ~RE-DEFINED
genomeFileSizes               3209761209   24407925765        ~RE-DEFINED
Genome version is compatible with current STAR
Number of real (reference) chromosomes= 86
1       1       249250621       0
2       2       243199373       249298944
3       3       198022430       492568576
4       4       191154276       690749440
5       5       180915260       882114560
6       6       171115067       1063256064
7       7       159138663       1234436096
8       8       146364022       1393819648
9       9       141213431       1540358144
10      10      135534747       1681653760
11      11      135006516       1817444352
12      12      133851895       1952710656
13      13      115169878       2086666240
14      14      107349540       2202009600
15      15      102531392       2309488640
16      16      90354753        2412249088
17      17      81195210        2502688768
18      18      78077248        2583953408
19      19      59128983        2662072320
20      20      63025520        2721316864
21      21      48129895        2784493568
22      22      51304566        2832728064
23      X       155270560       2884108288
24      Y       59373566        3039559680
25      MT      16569   3099066368
26      GL000207.1      4262    3099328512
27      GL000226.1      15008   3099590656
28      GL000229.1      19913   3099852800
29      GL000231.1      27386   3100114944
30      GL000210.1      27682   3100377088
31      GL000239.1      33824   3100639232
32      GL000235.1      34474   3100901376
33      GL000201.1      36148   3101163520
34      GL000247.1      36422   3101425664
35      GL000245.1      36651   3101687808
36      GL000197.1      37175   3101949952
37      GL000203.1      37498   3102212096
38      GL000246.1      38154   3102474240
39      GL000249.1      38502   3102736384
40      GL000196.1      38914   3102998528
41      GL000248.1      39786   3103260672
42      GL000244.1      39929   3103522816
43      GL000238.1      39939   3103784960
44      GL000202.1      40103   3104047104
45      GL000234.1      40531   3104309248
46      GL000232.1      40652   3104571392
47      GL000206.1      41001   3104833536
48      GL000240.1      41933   3105095680
49      GL000236.1      41934   3105357824
50      GL000241.1      42152   3105619968
51      GL000243.1      43341   3105882112
52      GL000242.1      43523   3106144256
53      GL000230.1      43691   3106406400
54      GL000237.1      45867   3106668544
55      GL000233.1      45941   3106930688
56      GL000204.1      81310   3107192832
57      GL000198.1      90085   3107454976
58      GL000208.1      92689   3107717120
59      GL000191.1      106433  3107979264
60      GL000227.1      128374  3108241408
61      GL000228.1      129120  3108503552
62      GL000214.1      137718  3108765696
63      GL000221.1      155397  3109027840
64      GL000209.1      159169  3109289984
65      GL000218.1      161147  3109552128
66      GL000220.1      161802  3109814272
67      GL000213.1      164239  3110076416
68      GL000211.1      166566  3110338560
69      GL000199.1      169874  3110600704
70      GL000217.1      172149  3110862848
71      GL000216.1      172294  3111124992
72      GL000215.1      172545  3111387136
73      GL000205.1      174588  3111649280
74      GL000219.1      179198  3111911424
75      GL000224.1      179693  3112173568
76      GL000223.1      180455  3112435712
77      GL000195.1      182896  3112697856
78      GL000212.1      186858  3112960000
79      GL000222.1      186861  3113222144
80      GL000200.1      187035  3113484288
81      GL000193.1      189789  3113746432
82      GL000194.1      191469  3114008576
83      GL000225.1      211173  3114270720
84      GL000192.1      547496  3114532864
86      hs37d5  35477943        3115581440
--sjdbOverhang = 100 taken from the generated genome
Started loading the genome: Mon Sep 28 11:14:18 2020

Genome: size given as a parameter = 3209761209
SA: size given as a parameter = 24407925765
SAindex: size given as a parameter = 1
Read from SAindex: pGe.gSAindexNbases=14  nSAi=357913940
nGenome=3209761209;  nSAbyte=24407925765
GstrandBit=32   SA number of indices=5917072912
Shared memory is not used for genomes. Allocated a private copy of the genome.
Genome file size: 3209761209 bytes; state: good=1 eof=0 fail=0 bad=0
Loading Genome ... done! state: good=1 eof=0 fail=0 bad=0; loaded 3209761209 bytes
SA file size: 24407925765 bytes; state: good=1 eof=0 fail=0 bad=0
Loading SA ... done! state: good=1 eof=0 fail=0 bad=0; loaded 24407925765 bytes
Loading SAindex ... done: 1565873619 bytes
Finished loading the genome: Mon Sep 28 11:15:08 2020

Processing splice junctions database sjdbN=291185,   pGe.sjdbOverhang=100 
alignIntronMax=alignMatesGapMax=0, the max intron size will be approximately determined by (2^winBinNbits)*winAnchorDistNbins=589824

ReadAlignChunk_processChunks.cpp:171:processChunks EXITING because of FATAL ERROR in input reads: unknown file format: the read ID should start with @ or > 

Sep 28 11:15:09 ...... FATAL ERROR, exiting

Thanks, Tinu

alexdobin commented 4 years ago

Hi Tinu,

according to the Log.out file, your command line is

STAR --genomeDir STAR_2.7.2a_genome_index --readFilesIn sampleA_L001_R1.fastq.gz sampleA_L001_R2.fastq.gz --readMapNumber 1

I think there is some sort of mix-up with the command line entry.

Cheers Alex

tinu-t commented 4 years ago

Hi Alex,

The backslash \ after the --readFilesIn got omitted and hence the command line entry was truncated.

Thanks, Tinu