alexdobin / STAR

RNA-seq aligner
MIT License
1.82k stars 502 forks source link

STAR FATAL ERROR, exiting / Transcriptome.cpp:18:Transcriptome: exiting because of *INPUT FILE* error #1953

Open Mpvrd opened 12 months ago

Mpvrd commented 12 months ago

Hi,

I'm using STAR to do an RNAseq analysis on Staphylococcus aureus sRNA. The aligment is working, but during the maping I have this Fatal error:

Transcriptome.cpp:18:Transcriptome: exiting because of INPUT FILE error: could not open input file /geneInfo.tab Solution: check that the file exists and you have read permission for this file SOLUTION: utilize --sjdbGTFfile /path/to/annotations.gtf option at the genome generation step or mapping step

Sep 20 08:04:56 ...... FATAL ERROR, exiting

This my command: STAR --runThreadN 8 --runMode alignReads --genomeDir genome_rnaseq --readFilesIn FASTQ/JE2_pH5-5_09-04-2019_S38_R1_001.fastq.gz --twopassMode Basic --outSAMtype None --quantMode GeneCounts --sjdbGTFfile ANNOTATION/sRNA_USA300_FPR3757V6.gtf --sjdbGTFfeatureExon CDS --outTmpDir temporal --outFileNamePrefix JE2_pH5-5_09-04-2019_S38_R1001 --readFilesCommand "gunzip -c" STAR version: 2.7.11a compiled: :/Users/marianepivard/STAR-2.7.11a/source

And I have cheched that the geneInfo.tab file is in the right folder and permission is given.

Can you help me on this? Many thanks, Log.out.txt

Mpvrd commented 12 months ago

This is the log.out file for the maping

JE2_pH5-5_09-04-2019_S38_R1_001_Log.out.txt

alexdobin commented 12 months ago

This is likely a problem with the GTF file. Please check that CDS lines in the file have gene_id tag.

Mpvrd commented 12 months ago

sRNA_USA300_FPR3757V6.gtf.zip

Thank you for your advise. It seems good though...

wanisajad commented 11 months ago

I am facing the similar problem. @Mpvrd were you able to solve it

Mpvrd commented 11 months ago

Hello, unfortunately not... We're still stuck... If you have any ideas, I'd be more than happy to hear them.

Mpvrd commented 11 months ago

Update: with an older version of STAR (2.7.5a) and the same files it's working... So maybe due to an update the GTF format needs to be different..? @alexdobin have you ever heard about this problem before?

alexdobin commented 11 months ago

Hi @Mpvrd

I have not seen an unsolvable issue like this. The genome index generated with 2.7.5a may work with the later versions of STAR.

wanisajad commented 11 months ago

Hi Alexander,

I have acquired the mouse genome reference sequence Mus_musculus.GRCm39.dna.primary_assembly.fa and the corresponding annotation file in GTF format Mus_musculus.GRCm39.104.gtf from the Ensembl database.

I've indexed the genome using the said reference and annotation files and everything went smoothly using STAR version 2.7.11a. I've been able to align bulk RNA-seq data with it.

However, when I attempt to process the single-cell data using STARsolo using STAR version 2.7.11a, I constantly encounter an error related to the /geneInfo.tab file, even though I've confirmed that the file exists and has the appropriate permissions

what .gtf and .fa files and which star version are compatible with each other.

On Mon, Sep 25, 2023 at 12:50 PM Alexander Dobin @.***> wrote:

Hi @Mpvrd https://github.com/Mpvrd

I have not seen an unsolvable issue like this. The genome index generated with 2.7.5a may work with the later versions of STAR.

— Reply to this email directly, view it on GitHub https://github.com/alexdobin/STAR/issues/1953#issuecomment-1734124120, or unsubscribe https://github.com/notifications/unsubscribe-auth/AKJOAZGPWXAXGN7NDRLNR2DX4GY3ZANCNFSM6AAAAAA47L4EQE . You are receiving this because you commented.Message ID: @.***>

alexdobin commented 11 months ago

Hi @wanisajad

please send me the Log.out file of the failed run, and the Log.out file of the genome generation run. Please rename them to Log.out.txt before attaching them.

wanisajad commented 11 months ago

Alexander Here are the attachments

On Mon, Sep 25, 2023 at 1:01 PM Alexander Dobin @.***> wrote:

Hi @wanisajad https://github.com/wanisajad

please send me the Log.out file of the failed run, and the Log.out file of the genome generation run. Please rename them to Log.out.txt before attaching them.

— Reply to this email directly, view it on GitHub https://github.com/alexdobin/STAR/issues/1953#issuecomment-1734140159, or unsubscribe https://github.com/notifications/unsubscribe-auth/AKJOAZEWUHTGIZGRWAV3O53X4G2HDANCNFSM6AAAAAA47L4EQE . You are receiving this because you were mentioned.Message ID: @.***>

STAR version=2.7.11a STAR compilation time,server,dir= :/Users/distiller/project/STARcompile/source STAR git: On branch master ; commit d57f92d95169a1bb97de791df0321f42634f18c8 ; diff files:

Command Line:

STAR --runMode alignReads --runThreadN 8 --genomeDir ./GenomeIndex --readFilesIn ./control_concat_R2.fastq.gz ./control_concat_R1.fastq.gz --readFilesCommand gunzip -c --outFileNamePrefix ./STARsolo_Out/ --soloType Droplet --soloCBwhitelist ./3M-february-2018.txt

Initial USER parameters from Command Line:

outFileNamePrefix ./STARsolo_Out/

All USER parameters from Command Line:

runMode alignReads ~RE-DEFINED runThreadN 8 ~RE-DEFINED genomeDir ./GenomeIndex ~RE-DEFINED readFilesIn ./control_concat_R2.fastq.gz ./control_concat_R1.fastq.gz ~RE-DEFINED readFilesCommand gunzip -c ~RE-DEFINED outFileNamePrefix ./STARsolo_Out/ ~RE-DEFINED soloType Droplet ~RE-DEFINED soloCBwhitelist ./3M-february-2018.txt ~RE-DEFINED

Finished reading parameters from all sources
Final user re-defined parameters-----------------:

runMode alignReads
runThreadN 8 genomeDir ./GenomeIndex readFilesIn ./control_concat_R2.fastq.gz ./control_concat_R1.fastq.gz
readFilesCommand gunzip -c
outFileNamePrefix ./STARsolo_Out/ soloType Droplet soloCBwhitelist ./3M-february-2018.txt


Final effective command line:

STAR --runMode alignReads --runThreadN 8 --genomeDir ./GenomeIndex --readFilesIn ./control_concat_R2.fastq.gz ./control_concat_R1.fastq.gz --readFilesCommand gunzip -c --outFileNamePrefix ./STARsolo_Out/ --soloType Droplet --soloCBwhitelist ./3M-february-2018.txt

Number of fastq files for each mate = 1

Input read files for mate 1 : -rw-r--r-- 1 wanisd staff 16269406132 Sep 21 15:02 ./control_concat_R2.fastq.gz

readsCommandsFile: exec > "./STARsolo_Out/_STARtmp/tmp.fifo.read1" echo FILE 0 gunzip -c "./control_concat_R2.fastq.gz"

Input read files for mate 2 : -rw-r--r-- 1 wanisd staff 10018610619 Sep 21 15:01 ./control_concat_R1.fastq.gz

readsCommandsFile: exec > "./STARsolo_Out/_STARtmp/tmp.fifo.read2" echo FILE 0 gunzip -c "./control_concat_R1.fastq.gz"

ParametersSolo: --soloCellFilterType CellRanger2.2 filtering parameters: 3000 0.99 10 Number of CBs in the whitelist = 6794880 Sep 22 23:13:55 ... Finished reading, sorting and deduplicating CB whitelist sequences. Finished loading and checking parameters Reading genome generation parameters:

STAR --runMode genomeGenerate --runThreadN 8 --genomeDir ./GenomeIndex --genomeFastaFiles Mus_musculus.GRCm39.dna.primary_assembly.fa --sjdbGTFfile Mus_musculus.GRCm39.104.gtf

GstrandBit=32

versionGenome 2.7.4a ~RE-DEFINED genomeType Full ~RE-DEFINED genomeFastaFiles Mus_musculus.GRCm39.dna.primary_assembly.fa ~RE-DEFINED genomeSAindexNbases 14 ~RE-DEFINED genomeChrBinNbits 18 ~RE-DEFINED genomeSAsparseD 1 ~RE-DEFINED genomeTransformType None ~RE-DEFINED genomeTransformVCF - ~RE-DEFINED sjdbOverhang 100 ~RE-DEFINED sjdbFileChrStartEnd - ~RE-DEFINED sjdbGTFfile Mus_musculus.GRCm39.104.gtf ~RE-DEFINED sjdbGTFchrPrefix - ~RE-DEFINED sjdbGTFfeatureExon exon ~RE-DEFINED sjdbGTFtagExonParentTranscripttranscript_id ~RE-DEFINED sjdbGTFtagExonParentGene gene_id ~RE-DEFINED sjdbInsertSave Basic ~RE-DEFINED genomeFileSizes 2795625144 22370747540 ~RE-DEFINED Genome version is compatible with current STAR Number of real (reference) chromosomes= 61 1 1 195154279 0 2 10 130530862 195297280 3 11 121973369 325844992 4 12 120092757 448004096 5 13 120883175 568328192 6 14 125139656 689438720 7 15 104073951 814743552 8 16 98008968 919076864 9 17 95294699 1017118720 10 18 90720763 1112539136 11 19 61420004 1203503104 12 2 181755017 1265106944 13 3 159745316 1447034880 14 4 156860686 1606942720 15 5 151758149 1763966976 16 6 149588044 1915748352 17 7 144995196 2065432576 18 8 130127694 2210660352 19 9 124359700 2340945920 20 MT 16299 2465464320 21 X 169476592 2465726464 22 Y 91455967 2635333632 23 JH584299.1 953012 2726821888 24 GL456233.2 559103 2727870464 25 JH584301.1 259875 2728656896 26 GL456211.1 241735 2728919040 27 GL456221.1 206961 2729181184 28 JH584297.1 205776 2729443328 29 JH584296.1 199368 2729705472 30 GL456354.1 195993 2729967616 31 JH584298.1 184189 2730229760 32 JH584300.1 182347 2730491904 33 GL456219.1 175968 2730754048 34 GL456210.1 169725 2731016192 35 JH584303.1 158099 2731278336 36 JH584302.1 155838 2731540480 37 GL456212.1 153618 2731802624 38 JH584304.1 114452 2732064768 39 GL456379.1 72385 2732326912 40 GL456366.1 47073 2732589056 41 GL456367.1 42057 2732851200 42 GL456239.1 40056 2733113344 43 GL456383.1 38659 2733375488 44 GL456385.1 35240 2733637632 45 GL456360.1 31704 2733899776 46 GL456378.1 31602 2734161920 47 MU069435.1 31129 2734424064 48 GL456389.1 28772 2734686208 49 GL456372.1 28664 2734948352 50 GL456370.1 26764 2735210496 51 GL456381.1 25871 2735472640 52 GL456387.1 24685 2735734784 53 GL456390.1 24668 2735996928 54 GL456394.1 24323 2736259072 55 GL456392.1 23629 2736521216 56 GL456382.1 23158 2736783360 57 GL456359.1 22974 2737045504 58 GL456396.1 21240 2737307648 59 GL456368.1 20208 2737569792 60 MU069434.1 8412 2737831936 61 JH584295.1 1976 2738094080 --sjdbOverhang = 100 taken from the generated genome Started loading the genome: Fri Sep 22 23:13:55 2023

Genome: size given as a parameter = 2795625144 SA: size given as a parameter = 22370747540 SAindex: size given as a parameter = 1 Read from SAindex: pGe.gSAindexNbases=14 nSAi=357913940 nGenome=2795625144; nSAbyte=22370747540 GstrandBit=32 SA number of indices=5423211524 Shared memory is not used for genomes. Allocated a private copy of the genome. Genome file size: 2795625144 bytes; state: good=1 eof=0 fail=0 bad=0 Loading Genome ... done! state: good=1 eof=0 fail=0 bad=0; loaded 2795625144 bytes SA file size: 22370747540 bytes; state: good=1 eof=0 fail=0 bad=0 Loading SA ... done! state: good=1 eof=0 fail=0 bad=0; loaded 22370747540 bytes Loading SAindex ... done: 1565873619 bytes Finished loading the genome: Fri Sep 22 23:14:10 2023

Sum of all Genome bytes: 4415333642 Sum of all SA bytes: 2763091441399 Sum of all SAi bytes: 183692884869 Processing splice junctions database sjdbN=284920, pGe.sjdbOverhang=100 alignIntronMax=alignMatesGapMax=0, the max intron size will be approximately determined by (2^winBinNbits)*winAnchorDistNbins=589824

Transcriptome.cpp:18:Transcriptome: exiting because of INPUT FILE error: could not open input file /geneInfo.tab Solution: check that the file exists and you have read permission for this file SOLUTION: utilize --sjdbGTFfile /path/to/annotations.gtf option at the genome generation step or mapping step

Sep 22 23:15:20 ...... FATAL ERROR, exiting

STAR version=2.7.11a STAR compilation time,server,dir= :/Users/distiller/project/STARcompile/source STAR git: On branch master ; commit d57f92d95169a1bb97de791df0321f42634f18c8 ; diff files:

Command Line:

STAR --runThreadN 8 --runMode genomeGenerate --genomeDir ./GenomeIndex --genomeFastaFiles Mus_musculus.GRCm39.dna.primary_assembly.fa --sjdbGTFfile Mus_musculus.GRCm39.104.gtf

Initial USER parameters from Command Line:
All USER parameters from Command Line:

runThreadN 8 ~RE-DEFINED runMode genomeGenerate ~RE-DEFINED genomeDir ./GenomeIndex ~RE-DEFINED genomeFastaFiles Mus_musculus.GRCm39.dna.primary_assembly.fa ~RE-DEFINED sjdbGTFfile Mus_musculus.GRCm39.104.gtf ~RE-DEFINED

Finished reading parameters from all sources
Final user re-defined parameters-----------------:

runMode genomeGenerate
runThreadN 8 genomeDir ./GenomeIndex genomeFastaFiles Mus_musculus.GRCm39.dna.primary_assembly.fa
sjdbGTFfile Mus_musculus.GRCm39.104.gtf


Final effective command line:

STAR --runMode genomeGenerate --runThreadN 8 --genomeDir ./GenomeIndex --genomeFastaFiles Mus_musculus.GRCm39.dna.primary_assembly.fa --sjdbGTFfile Mus_musculus.GRCm39.104.gtf

Number of fastq files for each mate = 1 ParametersSolo: --soloCellFilterType CellRanger2.2 filtering parameters: 3000 0.99 10 Finished loading and checking parameters --genomeDir directory exists and will be overwritten: ./GenomeIndex/ Sep 22 16:37:36 ... starting to generate Genome files Mus_musculus.GRCm39.dna.primary_assembly.fa : chr # 0 "1" chrStart: 0 Mus_musculus.GRCm39.dna.primary_assembly.fa : chr # 1 "10" chrStart: 195297280 Mus_musculus.GRCm39.dna.primary_assembly.fa : chr # 2 "11" chrStart: 325844992 Mus_musculus.GRCm39.dna.primary_assembly.fa : chr # 3 "12" chrStart: 448004096 Mus_musculus.GRCm39.dna.primary_assembly.fa : chr # 4 "13" chrStart: 568328192 Mus_musculus.GRCm39.dna.primary_assembly.fa : chr # 5 "14" chrStart: 689438720 Mus_musculus.GRCm39.dna.primary_assembly.fa : chr # 6 "15" chrStart: 814743552 Mus_musculus.GRCm39.dna.primary_assembly.fa : chr # 7 "16" chrStart: 919076864 Mus_musculus.GRCm39.dna.primary_assembly.fa : chr # 8 "17" chrStart: 1017118720 Mus_musculus.GRCm39.dna.primary_assembly.fa : chr # 9 "18" chrStart: 1112539136 Mus_musculus.GRCm39.dna.primary_assembly.fa : chr # 10 "19" chrStart: 1203503104 Mus_musculus.GRCm39.dna.primary_assembly.fa : chr # 11 "2" chrStart: 1265106944 Mus_musculus.GRCm39.dna.primary_assembly.fa : chr # 12 "3" chrStart: 1447034880 Mus_musculus.GRCm39.dna.primary_assembly.fa : chr # 13 "4" chrStart: 1606942720 Mus_musculus.GRCm39.dna.primary_assembly.fa : chr # 14 "5" chrStart: 1763966976 Mus_musculus.GRCm39.dna.primary_assembly.fa : chr # 15 "6" chrStart: 1915748352 Mus_musculus.GRCm39.dna.primary_assembly.fa : chr # 16 "7" chrStart: 2065432576 Mus_musculus.GRCm39.dna.primary_assembly.fa : chr # 17 "8" chrStart: 2210660352 Mus_musculus.GRCm39.dna.primary_assembly.fa : chr # 18 "9" chrStart: 2340945920 Mus_musculus.GRCm39.dna.primary_assembly.fa : chr # 19 "MT" chrStart: 2465464320 Mus_musculus.GRCm39.dna.primary_assembly.fa : chr # 20 "X" chrStart: 2465726464 Mus_musculus.GRCm39.dna.primary_assembly.fa : chr # 21 "Y" chrStart: 2635333632 Mus_musculus.GRCm39.dna.primary_assembly.fa : chr # 22 "JH584299.1" chrStart: 2726821888 Mus_musculus.GRCm39.dna.primary_assembly.fa : chr # 23 "GL456233.2" chrStart: 2727870464 Mus_musculus.GRCm39.dna.primary_assembly.fa : chr # 24 "JH584301.1" chrStart: 2728656896 Mus_musculus.GRCm39.dna.primary_assembly.fa : chr # 25 "GL456211.1" chrStart: 2728919040 Mus_musculus.GRCm39.dna.primary_assembly.fa : chr # 26 "GL456221.1" chrStart: 2729181184 Mus_musculus.GRCm39.dna.primary_assembly.fa : chr # 27 "JH584297.1" chrStart: 2729443328 Mus_musculus.GRCm39.dna.primary_assembly.fa : chr # 28 "JH584296.1" chrStart: 2729705472 Mus_musculus.GRCm39.dna.primary_assembly.fa : chr # 29 "GL456354.1" chrStart: 2729967616 Mus_musculus.GRCm39.dna.primary_assembly.fa : chr # 30 "JH584298.1" chrStart: 2730229760 Mus_musculus.GRCm39.dna.primary_assembly.fa : chr # 31 "JH584300.1" chrStart: 2730491904 Mus_musculus.GRCm39.dna.primary_assembly.fa : chr # 32 "GL456219.1" chrStart: 2730754048 Mus_musculus.GRCm39.dna.primary_assembly.fa : chr # 33 "GL456210.1" chrStart: 2731016192 Mus_musculus.GRCm39.dna.primary_assembly.fa : chr # 34 "JH584303.1" chrStart: 2731278336 Mus_musculus.GRCm39.dna.primary_assembly.fa : chr # 35 "JH584302.1" chrStart: 2731540480 Mus_musculus.GRCm39.dna.primary_assembly.fa : chr # 36 "GL456212.1" chrStart: 2731802624 Mus_musculus.GRCm39.dna.primary_assembly.fa : chr # 37 "JH584304.1" chrStart: 2732064768 Mus_musculus.GRCm39.dna.primary_assembly.fa : chr # 38 "GL456379.1" chrStart: 2732326912 Mus_musculus.GRCm39.dna.primary_assembly.fa : chr # 39 "GL456366.1" chrStart: 2732589056 Mus_musculus.GRCm39.dna.primary_assembly.fa : chr # 40 "GL456367.1" chrStart: 2732851200 Mus_musculus.GRCm39.dna.primary_assembly.fa : chr # 41 "GL456239.1" chrStart: 2733113344 Mus_musculus.GRCm39.dna.primary_assembly.fa : chr # 42 "GL456383.1" chrStart: 2733375488 Mus_musculus.GRCm39.dna.primary_assembly.fa : chr # 43 "GL456385.1" chrStart: 2733637632 Mus_musculus.GRCm39.dna.primary_assembly.fa : chr # 44 "GL456360.1" chrStart: 2733899776 Mus_musculus.GRCm39.dna.primary_assembly.fa : chr # 45 "GL456378.1" chrStart: 2734161920 Mus_musculus.GRCm39.dna.primary_assembly.fa : chr # 46 "MU069435.1" chrStart: 2734424064 Mus_musculus.GRCm39.dna.primary_assembly.fa : chr # 47 "GL456389.1" chrStart: 2734686208 Mus_musculus.GRCm39.dna.primary_assembly.fa : chr # 48 "GL456372.1" chrStart: 2734948352 Mus_musculus.GRCm39.dna.primary_assembly.fa : chr # 49 "GL456370.1" chrStart: 2735210496 Mus_musculus.GRCm39.dna.primary_assembly.fa : chr # 50 "GL456381.1" chrStart: 2735472640 Mus_musculus.GRCm39.dna.primary_assembly.fa : chr # 51 "GL456387.1" chrStart: 2735734784 Mus_musculus.GRCm39.dna.primary_assembly.fa : chr # 52 "GL456390.1" chrStart: 2735996928 Mus_musculus.GRCm39.dna.primary_assembly.fa : chr # 53 "GL456394.1" chrStart: 2736259072 Mus_musculus.GRCm39.dna.primary_assembly.fa : chr # 54 "GL456392.1" chrStart: 2736521216 Mus_musculus.GRCm39.dna.primary_assembly.fa : chr # 55 "GL456382.1" chrStart: 2736783360 Mus_musculus.GRCm39.dna.primary_assembly.fa : chr # 56 "GL456359.1" chrStart: 2737045504 Mus_musculus.GRCm39.dna.primary_assembly.fa : chr # 57 "GL456396.1" chrStart: 2737307648 Mus_musculus.GRCm39.dna.primary_assembly.fa : chr # 58 "GL456368.1" chrStart: 2737569792 Mus_musculus.GRCm39.dna.primary_assembly.fa : chr # 59 "MU069434.1" chrStart: 2737831936 Mus_musculus.GRCm39.dna.primary_assembly.fa : chr # 60 "JH584295.1" chrStart: 2738094080 Chromosome sequence lengths: 1 195154279 10 130530862 11 121973369 12 120092757 13 120883175 14 125139656 15 104073951 16 98008968 17 95294699 18 90720763 19 61420004 2 181755017 3 159745316 4 156860686 5 151758149 6 149588044 7 144995196 8 130127694 9 124359700 MT 16299 X 169476592 Y 91455967 JH584299.1 953012 GL456233.2 559103 JH584301.1 259875 GL456211.1 241735 GL456221.1 206961 JH584297.1 205776 JH584296.1 199368 GL456354.1 195993 JH584298.1 184189 JH584300.1 182347 GL456219.1 175968 GL456210.1 169725 JH584303.1 158099 JH584302.1 155838 GL456212.1 153618 JH584304.1 114452 GL456379.1 72385 GL456366.1 47073 GL456367.1 42057 GL456239.1 40056 GL456383.1 38659 GL456385.1 35240 GL456360.1 31704 GL456378.1 31602 MU069435.1 31129 GL456389.1 28772 GL456372.1 28664 GL456370.1 26764 GL456381.1 25871 GL456387.1 24685 GL456390.1 24668 GL456394.1 24323 GL456392.1 23629 GL456382.1 23158 GL456359.1 22974 GL456396.1 21240 GL456368.1 20208 MU069434.1 8412 JH584295.1 1976 Genome sequence total length = 2728222451 Genome size with padding = 2738356224 Sep 22 16:38:25 ..... processing annotations GTF Processing pGe.sjdbGTFfile=Mus_musculus.GRCm39.104.gtf, found: 142434 transcripts 842117 exons (non-collapsed) 284993 collapsed junctions Total junctions: 284993 Sep 22 16:38:34 ..... finished GTF processing

Estimated genome size with padding and SJs: total=genome+SJ=2939356224 = 2738356224 + 201000000 GstrandBit=32 Number of SA indices: 5309243566 Sep 22 16:38:39 ... starting to sort Suffix Array. This may take a long time... Number of chunks: 24; chunks size limit: 1914246544 bytes Sep 22 16:38:47 ... sorting Suffix Array chunks and saving them to disk... Writing 1761388056 bytes into ./GenomeIndex//SA_1 ; empty space on disk = 179833150636032 bytes ... done Writing 1854034976 bytes into ./GenomeIndex//SA_0 ; empty space on disk = 179382234644480 bytes ... done Writing 1832467656 bytes into ./GenomeIndex//SA_2 ; empty space on disk = 178907600912384 bytes ... done Writing 1790148352 bytes into ./GenomeIndex//SA_6 ; empty space on disk = 178438488981504 bytes ... done Writing 1868710672 bytes into ./GenomeIndex//SA_4 ; empty space on disk = 177980210937856 bytes ... done Writing 1735971512 bytes into ./GenomeIndex//SA_3 ; empty space on disk = 177501820157952 bytes ... done Writing 1881135680 bytes into ./GenomeIndex//SA_7 ; empty space on disk = 177057410580480 bytes ... done Writing 1836071784 bytes into ./GenomeIndex//SA_5 ; empty space on disk = 176575847858176 bytes ... done Writing 1738304216 bytes into ./GenomeIndex//SA_10 ; empty space on disk = 176106849173504 bytes ... done Writing 1787061616 bytes into ./GenomeIndex//SA_9 ; empty space on disk = 175662132363264 bytes ... done Writing 1773261472 bytes into ./GenomeIndex//SA_8 ; empty space on disk = 175205608587264 bytes ... done Writing 1882586808 bytes into ./GenomeIndex//SA_14 ; empty space on disk = 174751654871040 bytes ... done Writing 1826940776 bytes into ./GenomeIndex//SA_11 ; empty space on disk = 174271010701312 bytes ... done Writing 1869480048 bytes into ./GenomeIndex//SA_12 ; empty space on disk = 173803313299456 bytes ... done Writing 1882992120 bytes into ./GenomeIndex//SA_13 ; empty space on disk = 173324423397376 bytes ... done Writing 1881806984 bytes into ./GenomeIndex//SA_15 ; empty space on disk = 172842374135808 bytes ... done Writing 801751320 bytes into ./GenomeIndex//SA_23 ; empty space on disk = 172360607989760 bytes ... done Writing 1843649328 bytes into ./GenomeIndex//SA_18 ; empty space on disk = 172155303100416 bytes ... done Writing 1775408048 bytes into ./GenomeIndex//SA_19 ; empty space on disk = 171683328557056 bytes ... done Writing 1581660584 bytes into ./GenomeIndex//SA_21 ; empty space on disk = 171228823289856 bytes ... done Writing 1707621360 bytes into ./GenomeIndex//SA_20 ; empty space on disk = 170823917764608 bytes ... done Writing 1763670480 bytes into ./GenomeIndex//SA_22 ; empty space on disk = 170386766430208 bytes ... done Writing 1910082552 bytes into ./GenomeIndex//SA_17 ; empty space on disk = 169935249604608 bytes ... done Writing 1887742128 bytes into ./GenomeIndex//SA_16 ; empty space on disk = 169446268207104 bytes ... done Sep 22 16:50:34 ... loading chunks from disk, packing SA... Sep 22 16:51:35 ... finished generating suffix array Sep 22 16:51:35 ... generating Suffix Array index Sep 22 16:56:15 ... completed Suffix Array index WARNING: long repeat for junction # 64485 : 13 36192501 36373186; left shift = 255; right shift = 29 WARNING: long repeat for junction # 72007 : 13 119631671 120070727; left shift = 36; right shift = 255 WARNING: long repeat for junction # 88263 : 15 75976552 75988902; left shift = 0; right shift = 255 WARNING: long repeat for junction # 196751 : 5 112922393 112922883; left shift = 255; right shift = 69 WARNING: long repeat for junction # 199109 : 5 123328298 123329023; left shift = 255; right shift = 19 WARNING: long repeat for junction # 208200 : 6 48731569 48732130; left shift = 81; right shift = 255 WARNING: long repeat for junction # 240123 : 7 141361933 141364041; left shift = 255; right shift = 255 Sep 22 16:56:15 Finished preparing junctions Sep 22 16:56:15 ..... inserting junctions into the genome indices Sep 22 16:59:53 Finished SA search: number of new junctions=284920, old junctions=0 Sep 22 17:00:27 Finished sorting SA indicesL nInd=113967958 Genome size with junctions=2795625144 2738356224 57268920 GstrandBit1=32 GstrandBit=32 Sep 22 17:02:19 Finished inserting junction indices Sep 22 17:02:43 Finished SAi Sep 22 17:02:43 ..... finished inserting junctions into genome Sep 22 17:02:43 ... writing Genome to disk ... Writing 2795625144 bytes into ./GenomeIndex//Genome ; empty space on disk = 177910347464704 bytes ... done SA size in bytes: 22370747540 Sep 22 17:02:58 ... writing Suffix Array to disk ... Writing 22370747540 bytes into ./GenomeIndex//SA ; empty space on disk = 177469525065728 bytes ... done Sep 22 17:04:38 ... writing SAindex to disk Writing 8 bytes into ./GenomeIndex//SAindex ; empty space on disk = 172016351051776 bytes ... done Writing 120 bytes into ./GenomeIndex//SAindex ; empty space on disk = 172016351051776 bytes ... done Writing 1565873491 bytes into ./GenomeIndex//SAindex ; empty space on disk = 172016351051776 bytes ... done Sep 22 17:04:42 ..... finished successfully DONE: Genome generation, EXITING

Mpvrd commented 11 months ago

Hi @alexdobin,

From my log.out file and my GTF file, do you have an idea where the problem comes from? It seems to be the junction, but on my side, I really have no idea how to solve this. Thanks for your help

wanisajad commented 11 months ago

Hi @alexdobin Could you please check my Log.out files. Thanks

cjschmidt79 commented 11 months ago

I am having the same problem and I was wondering if it was ever solved. The geneInfo.tab is in the INDEX directory and it is readable.

star --runThreadN 16 --runMode alignReads --outSAMtype BAM SortedByCoordinate --quantMode GeneCounts --outFileNamePrefix TEST --genomeDir INDEX --sjdbGTFfile Gallus7bENS.gtf --readFilesIn SRA/SRR20047277.lite.1_1.fastq SRA/SRR20047277.lite.1_2.fastq
STAR version: 2.7.11a   compiled:  :/Users/distiller/project/STARcompile/source

Oct 07 22:17:42 ..... started STAR run Oct 07 22:17:42 ..... loading genome

Oct 07 22:17:46 ..... processing annotations GTF Oct 07 22:17:52 ..... inserting junctions into the genome indices

Transcriptome.cpp:18:Transcriptome: exiting because of INPUT FILE error: could not open input file /geneInfo.tab Solution: check that the file exists and you have read permission for this file SOLUTION: utilize --sjdbGTFfile /path/to/annotations.gtf option at the genome generation step or mapping step

Oct 07 22:18:14 ...... FATAL ERROR, exiting

Steven1817 commented 11 months ago

I use the Mac OS with the teriminal Same problem ulimit -n 1048576 cat samples.txt | while read sample; do STAR \ --runThreadN 10 \ --genomeDir reference/STAR \ --readFilesIn 02-clean-data/${sample}.raw.R1.fq.gz \ 02-clean-data/${sample}.raw.R2.fq.gz \ --outFileNamePrefix 03-read-align/${sample}_ \ --outSAMtype BAM SortedByCoordinate \ --quantMode GeneCounts \ --outBAMsortingThreadN 10 done

(rna-seq) qijiaqian@MacBook-Pro 2023-10-08 % bash run.ssh STAR --runThreadN 10 --genomeDir reference/STAR --readFilesIn 02-clean-data/F5-meg01-plko-1.raw.R1.fq.gz 02-clean-data/F5-meg01-plko-1.raw.R2.fq.gz --outFileNamePrefix 03-read-align/F5-meg01-plko-1_ --outSAMtype BAM SortedByCoordinate --quantMode GeneCounts --outBAMsortingThreadN 10 STAR version: 2.7.11a compiled: :/Users/distiller/project/STARcompile/source Oct 09 11:13:59 ..... started STAR run Oct 09 11:13:59 ..... loading genome

Transcriptome.cpp:18:Transcriptome: exiting because of INPUT FILE error: could not open input file /geneInfo.tab Solution: check that the file exists and you have read permission for this file SOLUTION: utilize --sjdbGTFfile /path/to/annotations.gtf option at the genome generation step or mapping step

Mpvrd commented 11 months ago

@cjschmidt79 and @Steven1817 The problem has never been solved...

@alexdobin did you have some time to have a look at it?

alexdobin commented 11 months ago

This means that STAR could not find or open geneInfo.tab file in the --genomeDir directory.

developer-ss28 commented 11 months ago

Hello, @alexdobin.

I may have found the cause of this error, so I will report it.

I think the cause of this error is that transform.outQuant is not initialized in line 27 of ParametersGenome.cpp.

//ParametersGenome.cpp : line 27

transform.outYes = transform.outSAM = transform.outSJ = false;

Therefore, in lines 12 to 16 of Transcriptome.cpp, ELSE statement is processed even when IF statement should be processed. (In my environment, P.pGe.transform.outQuant == (146 or 209))

// Transcriptome.cpp : line 12 -> 18

if (!P.pGe.transform.outQuant) {//standard
    trInfoDir = P.pGe.sjdbGTFfile=="-" ? P.pGe.gDir : P.sjdbInsert.outDir; //if GTF file is given at the mapping stage, it's always used for transcript info
} else {//transformed genome
    trInfoDir = P.pGeOut.gDir;
};

ifstream &geStream = ifstrOpen(trInfoDir+"/geneInfo.tab", ERROR_OUT, "SOLUTION: utilize --sjdbGTFfile /path/to/annotations.gtf option at the genome generation step or mapping step", P);

Therefore, an empty character (?) is assigned to trInfoDir, so trInfoDir+"/genoInfo.tab" cannot be opened in line 18 of Transcriptome.cpp. (Because, trInfoDir+"/genoInfo.tab" == "/genoInfo.tab" and "/genoInfo.tab" does not exist.)

Sorry if I am wrong.

alexdobin commented 10 months ago

Hi @developer-ss28

thanks for investigating it. You may be right, I will look into it shortly.

Cheers Alex

developer-ss28 commented 10 months ago

Thank you for your kind reply, @alexdobin. I will add some additional information.

About my environment.

OS : macOS Ventura ver. 13.4 CPU : Apple M1 Memory : 16GB Rosetta2 : Use STAR version : 2.7.11a, downloaded from github.

STAR option

To use RSEM after STAR, I used the following options.

/Users/ss28/STAR/STAR-2.7.11a/bin/MacOSX_x86_64/STAR 
--runThreadN 4 
--runMode alignReads
--genomeDir /Users/ss28/Labo/practice/ngsbook2_RNAseq/ref_gencode/standard/star_dir/star_reference
--readFilesIn SRR1550989_1.fastq.gz SRR1550989_2.fastq.gz
--readFilesPrefix /Users/ss28/Labo/practice/ngsbook2_RNAseq/seq/
--readFilesCommand "gunzip -c "
--outSAMtype BAM Unsorted
--outFileNamePrefix /Users/ss28/Labo/practice/ngsbook2_RNAseq/ref_gencode/standard/star_dir/star_results/SRR1550989
--outFilterMismatchNoverReadLmax 0.04
--outFilterMultimapNmax 10
--quantMode TranscriptomeSAM

As a result, the error was occurred. So I changed line 27 of ParametersGenome.cpp as follows and compiled STAR according to the STAR manual.

// before
transform.outYes = transform.outSAM = transform.outSJ = false;

//after
transform.outYes = transform.outSAM = transform.outSJ = transform.outQuant = false;

And the error no longer occurred. The options used are as follows.

/Users/ss28/STAR/STAR-2.7.11a-neo/source/STAR
--runThreadN 6
--runMode alignReads
--genomeDir /Users/ss28/Labo/practice/ngsbook2_RNAseq/ref_gencode/standard2/star_dir/star_reference
--readFilesIn SRR1550989_1.fastq.gz SRR1550989_2.fastq.gz
--readFilesPrefix /Users/ss28/Labo/practice/ngsbook2_RNAseq/seq/
--readFilesCommand "gunzip -c "
--outSAMtype BAM Unsorted
--outFileNamePrefix /Users/ss28/Labo/practice/ngsbook2_RNAseq/ref_gencode/standard2/star_dir/star_results/SRR1550989_
--outFilterMismatchNoverReadLmax 0.04
--quantMode TranscriptomeSAM

I hope this helps.

iferapontova commented 8 months ago

Hello, unfortunately I am having the same problem, which seems to be related to the use of the --quantMode TranscriptomeSAM option. I did not have any issues generating alignments before adding this option. I have specified the gtf used for indexing with --sjdbGTFfile and checked the location and permissions but I am still getting this error: Transcriptome.cpp:18:Transcriptome: exiting because of INPUT FILE error: could not open input file /geneInfo.tab Solution: check that the file exists and you have read permission for this file SOLUTION: utilize --sjdbGTFfile /path/to/annotations.gtf option at the genome generation step or mapping step

Unfortunately the solution suggested by @developer-ss28 didn't work for me. Thank you so much for your help.

Steven1817 commented 8 months ago

mamba create -n rna-seq -c bioconda -c conda-forge \ sra-tools star==2.7.5a samtools deeptools subread fastqc fastp pysradb

use the star of 2.7.5a version, have a try. I solve the problem now.

---- Replied Message ---- | From | @.> | | Date | 01/11/2024 20:45 | | To | alexdobin/STAR @.> | | Cc | Steven1817 @.>, Mention @.> | | Subject | Re: [alexdobin/STAR] STAR FATAL ERROR, exiting / Transcriptome.cpp:18:Transcriptome: exiting because of INPUT FILE error (Issue #1953) |

Hello, unfortunately I am having the same problem, which seems to be related to the use of the --quantMode TranscriptomeSAM option. I did not have any issues generating alignments before adding this option. I have specified the gtf used for indexing with --sjdbGTFfile and checked the location and permissions but I am still getting this error: Transcriptome.cpp:18:Transcriptome: exiting because of INPUT FILE error: could not open input file /geneInfo.tab Solution: check that the file exists and you have read permission for this file SOLUTION: utilize --sjdbGTFfile /path/to/annotations.gtf option at the genome generation step or mapping step

Unfortunately the solution suggested by @developer-ss28 didn't work for me. Thank you so much for your help.

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you were mentioned.Message ID: @.***>

iferapontova commented 8 months ago

Thank you so much, that fixed it!

All the best, Irina

On 11 Jan 2024, at 13:49, Steven1817 @.***> wrote:

mamba create -n rna-seq -c bioconda -c conda-forge \ sra-tools star==2.7.5a samtools deeptools subread fastqc fastp pysradb

use the star of 2.7.5a version, have a try. I solve the problem now.

---- Replied Message ---- | From | @.> | | Date | 01/11/2024 20:45 | | To | alexdobin/STAR @.> | | Cc | Steven1817 @.>, Mention @.> | | Subject | Re: [alexdobin/STAR] STAR FATAL ERROR, exiting / Transcriptome.cpp:18:Transcriptome: exiting because of INPUT FILE error (Issue #1953) |

Hello, unfortunately I am having the same problem, which seems to be related to the use of the --quantMode TranscriptomeSAM option. I did not have any issues generating alignments before adding this option. I have specified the gtf used for indexing with --sjdbGTFfile and checked the location and permissions but I am still getting this error: Transcriptome.cpp:18:Transcriptome: exiting because of INPUT FILE error: could not open input file /geneInfo.tab Solution: check that the file exists and you have read permission for this file SOLUTION: utilize --sjdbGTFfile /path/to/annotations.gtf option at the genome generation step or mapping step

Unfortunately the solution suggested by @developer-ss28 didn't work for me. Thank you so much for your help.

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you were mentioned.Message ID: @.***> — Reply to this email directly, view it on GitHub https://github.com/alexdobin/STAR/issues/1953#issuecomment-1887097474, or unsubscribe https://github.com/notifications/unsubscribe-auth/A63OI2LMQPXO2MXP2ZBU7BLYN7NUPAVCNFSM6AAAAAA47L4EQGVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQOBXGA4TONBXGQ. You are receiving this because you commented.

Steven1817 commented 8 months ago

you’re welcome!

戚嘉乾

@.*** |

---- Replied Message ---- | From | @.> | | Date | 01/11/2024 21:52 | | To | @.> | | Cc | @.> , @.> | | Subject | Re: [alexdobin/STAR] STAR FATAL ERROR, exiting / Transcriptome.cpp:18:Transcriptome: exiting because of INPUT FILE error (Issue #1953) |

Thank you so much, that fixed it!

All the best, Irina

On 11 Jan 2024, at 13:49, Steven1817 @.***> wrote:

mamba create -n rna-seq -c bioconda -c conda-forge \ sra-tools star==2.7.5a samtools deeptools subread fastqc fastp pysradb

use the star of 2.7.5a version, have a try. I solve the problem now.

---- Replied Message ---- | From | @.> | | Date | 01/11/2024 20:45 | | To | alexdobin/STAR @.> | | Cc | Steven1817 @.>, Mention @.> | | Subject | Re: [alexdobin/STAR] STAR FATAL ERROR, exiting / Transcriptome.cpp:18:Transcriptome: exiting because of INPUT FILE error (Issue #1953) |

Hello, unfortunately I am having the same problem, which seems to be related to the use of the --quantMode TranscriptomeSAM option. I did not have any issues generating alignments before adding this option. I have specified the gtf used for indexing with --sjdbGTFfile and checked the location and permissions but I am still getting this error: Transcriptome.cpp:18:Transcriptome: exiting because of INPUT FILE error: could not open input file /geneInfo.tab Solution: check that the file exists and you have read permission for this file SOLUTION: utilize --sjdbGTFfile /path/to/annotations.gtf option at the genome generation step or mapping step

Unfortunately the solution suggested by @developer-ss28 didn't work for me. Thank you so much for your help.

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you were mentioned.Message ID: @.***> — Reply to this email directly, view it on GitHub https://github.com/alexdobin/STAR/issues/1953#issuecomment-1887097474, or unsubscribe https://github.com/notifications/unsubscribe-auth/A63OI2LMQPXO2MXP2ZBU7BLYN7NUPAVCNFSM6AAAAAA47L4EQGVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQOBXGA4TONBXGQ. You are receiving this because you commented.

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you were mentioned.Message ID: @.***>

sknaack commented 7 months ago

Hi @alexdobin (maybe also @developer-ss28?),

I've a very quick inquiry for (either of) you about the final determination on the proposed change to the line 27 of ParametersGenome.cpp

 // before
 transform.outYes = transform.outSAM = transform.outSJ = false;

 //after
 transform.outYes = transform.outSAM = transform.outSJ = transform.outQuant = false;

I briefly found a similar issue using an existing reference in STAR (made with v2.7.10b by someone else in Linux) verbatim in STAR v2.7.11a (In OS X). This was with all arguments to STAR v2.7.11a being otherwise the same as for other successful runs I previously ran with a locally prepared reference (i.e., same genome but differently-filtered annotations). This outside reference had a perfectly interpretable geneInfo.tab file, but it wasn't being recognized. When I tried this one-line change everything proceeded normally and I have no clear reason to think the results generated are problematic. Still, I see this change hasn't been adopted in the current ParametersGenome.cpp version in the repository. Was this ultimately not recommendable? Is it better to remake the entire reference locally from the same annotation gtf and genome .fa without this change in the code? I wanted to use the outside reference verbatim for reproducibility if possible, and the respective versions of STAR seemed to allow for this. Perhaps that's tricky if the reference was created in a different computing environment (linux vs OS X).

Thanks in advance for any definitive clarification! Have a good day,

Best

Sara

Hi @developer-ss28

thanks for investigating it. You may be right, I will look into it shortly.

Cheers Alex

Thank you for your kind reply, @alexdobin. I will add some additional information.

About my environment.

OS : macOS Ventura ver. 13.4 CPU : Apple M1 Memory : 16GB Rosetta2 : Use STAR version : 2.7.11a, downloaded from github.

STAR option

To use RSEM after STAR, I used the following options.

/Users/ss28/STAR/STAR-2.7.11a/bin/MacOSX_x86_64/STAR 
--runThreadN 4 
--runMode alignReads
--genomeDir /Users/ss28/Labo/practice/ngsbook2_RNAseq/ref_gencode/standard/star_dir/star_reference
--readFilesIn SRR1550989_1.fastq.gz SRR1550989_2.fastq.gz
--readFilesPrefix /Users/ss28/Labo/practice/ngsbook2_RNAseq/seq/
--readFilesCommand "gunzip -c "
--outSAMtype BAM Unsorted
--outFileNamePrefix /Users/ss28/Labo/practice/ngsbook2_RNAseq/ref_gencode/standard/star_dir/star_results/SRR1550989
--outFilterMismatchNoverReadLmax 0.04
--outFilterMultimapNmax 10
--quantMode TranscriptomeSAM

As a result, the error was occurred. So I changed line 27 of ParametersGenome.cpp as follows and compiled STAR according to the STAR manual.

// before
transform.outYes = transform.outSAM = transform.outSJ = false;

//after
transform.outYes = transform.outSAM = transform.outSJ = transform.outQuant = false;

And the error no longer occurred. The options used are as follows.

/Users/ss28/STAR/STAR-2.7.11a-neo/source/STAR
--runThreadN 6
--runMode alignReads
--genomeDir /Users/ss28/Labo/practice/ngsbook2_RNAseq/ref_gencode/standard2/star_dir/star_reference
--readFilesIn SRR1550989_1.fastq.gz SRR1550989_2.fastq.gz
--readFilesPrefix /Users/ss28/Labo/practice/ngsbook2_RNAseq/seq/
--readFilesCommand "gunzip -c "
--outSAMtype BAM Unsorted
--outFileNamePrefix /Users/ss28/Labo/practice/ngsbook2_RNAseq/ref_gencode/standard2/star_dir/star_results/SRR1550989_
--outFilterMismatchNoverReadLmax 0.04
--quantMode TranscriptomeSAM

I hope this helps.

alexdobin commented 7 months ago

Hi @sknaack

This issue has slipped my mind. The suggestion by @developer-ss28 is correct and may fix the issue in some cases. I made a pre-release with this fix, please try it out: https://github.com/dobinlab/STAR_pre_releases/releases/tag/2.7.11b_alpha_2024-02-09

sknaack commented 7 months ago

Thank you kindly for the confirmation @alexdobin. Good to know! Primarily I just wanted to make sure this was appropriate in your view. It indeed corrected an error case in my own experience as well. ~S

empotts commented 6 months ago

My ParametersGenome.cpp file has the above changes, but I am getting the same error as described above. Has there been any update on this?

LeonidBystrykh commented 3 months ago

Hi, I have freshly installed STAR version: 2.7.11b on iMAC and the problem is still the same. Fixing line 27 and running make again did not help. I installed STAR version for MAC as recommended

btownshend commented 3 months ago

I have the same error/problem -- would be good to have a fix for this!

benostendorf commented 3 months ago

Would it be possible to have this fix released and submitted to Bioconda? Thank you!

dlaehnemann commented 3 months ago

@alexdobin I'd gladly help with ushering this through the bioconda release process, if you could create a new release from this over here. I fear that both alternative options of getting this switch onto bioconda (either switching over to a pre-release or including a patch in the bioconda recipe) would rather end up causing more potential for confusion. Or is anything holding back a new release, so that we should instead go for a patch?

Sogand65 commented 1 month ago

Hi,

I have Mac M1 and tried different version and the pre released version you mentioned, 2.7.11b_pre..., However I got a new error as: BAMoutput.cpp:27:BAMoutput: exiting because of OUTPUT FILE error : could not create output file Testdata_out/STARMapping//5/human_STARtmp//BAMsort/4/43 SOLUTION: check that the path exists and you have write permission for this file. Also check ulimit -n and increase it to allow more open files. I checked the permission and it is rw, previous versions could not open geneInfo.tab and the new one BAMsort!

I appreciate any help or suggestion.

Best, Sogi

this is my log.out content:

STAR version=2.7.11b_alpha_2024-02-09 STAR compilation time,server,dir= :/Users/sssajedi/Downloads/STAR_pre_releases-2.7.11b_alpha_2024-02-09/source STAR git:

Command Line:

STAR --runMode genomeGenerate --genomeDir hg19STARIndex/ --runThreadN 20 --genomeFastaFiles data/hg19.fa --sjdbGTFfile data/gencode.v46lift37.annotation.gtf

Initial USER parameters from Command Line:
All USER parameters from Command Line:

runMode genomeGenerate ~RE-DEFINED genomeDir hg19STARIndex/ ~RE-DEFINED runThreadN 20 ~RE-DEFINED genomeFastaFiles data/hg19.fa ~RE-DEFINED sjdbGTFfile data/gencode.v46lift37.annotation.gtf ~RE-DEFINED

Finished reading parameters from all sources
Final user re-defined parameters-----------------:

runMode genomeGenerate
runThreadN 20 genomeDir hg19STARIndex/ genomeFastaFiles data/hg19.fa
sjdbGTFfile data/gencode.v46lift37.annotation.gtf


Final effective command line:

STAR --runMode genomeGenerate --runThreadN 20 --genomeDir hg19STARIndex/ --genomeFastaFiles data/hg19.fa --sjdbGTFfile data/gencode.v46lift37.annotation.gtf

Number of fastq files for each mate = 1 ParametersSolo: --soloCellFilterType CellRanger2.2 filtering parameters: 3000 0.99 10 Finished loading and checking parameters --genomeDir directory created: hg19STARIndex/ Aug 13 20:42:43 ... starting to generate Genome files data/hg19.fa : chr # 0 "chr1" chrStart: 0 data/hg19.fa : chr # 1 "chr2" chrStart: 249298944 data/hg19.fa : chr # 2 "chr3" chrStart: 492568576 data/hg19.fa : chr # 3 "chr4" chrStart: 690749440 data/hg19.fa : chr # 4 "chr5" chrStart: 882114560 data/hg19.fa : chr # 5 "chr6" chrStart: 1063256064 data/hg19.fa : chr # 6 "chr7" chrStart: 1234436096 data/hg19.fa : chr # 7 "chrX" chrStart: 1393819648 data/hg19.fa : chr # 8 "chr8" chrStart: 1549271040 data/hg19.fa : chr # 9 "chr9" chrStart: 1695809536 data/hg19.fa : chr # 10 "chr10" chrStart: 1837105152 data/hg19.fa : chr # 11 "chr11" chrStart: 1972895744 data/hg19.fa : chr # 12 "chr12" chrStart: 2108162048 data/hg19.fa : chr # 13 "chr13" chrStart: 2242117632 data/hg19.fa : chr # 14 "chr14" chrStart: 2357460992 data/hg19.fa : chr # 15 "chr15" chrStart: 2464940032 data/hg19.fa : chr # 16 "chr16" chrStart: 2567700480 data/hg19.fa : chr # 17 "chr17" chrStart: 2658140160 data/hg19.fa : chr # 18 "chr18" chrStart: 2739404800 data/hg19.fa : chr # 19 "chr20" chrStart: 2817523712 data/hg19.fa : chr # 20 "chrY" chrStart: 2880700416 data/hg19.fa : chr # 21 "chr19" chrStart: 2940207104 data/hg19.fa : chr # 22 "chr22" chrStart: 2999451648 data/hg19.fa : chr # 23 "chr21" chrStart: 3050831872 data/hg19.fa : chr # 24 "chr6_ssto_hap7" chrStart: 3099066368 data/hg19.fa : chr # 25 "chr6_mcf_hap5" chrStart: 3104047104 data/hg19.fa : chr # 26 "chr6_cox_hap2" chrStart: 3109027840 data/hg19.fa : chr # 27 "chr6_mann_hap4" chrStart: 3114008576 data/hg19.fa : chr # 28 "chr6_apd_hap1" chrStart: 3118727168 data/hg19.fa : chr # 29 "chr6_qbl_hap6" chrStart: 3123445760 data/hg19.fa : chr # 30 "chr6_dbb_hap3" chrStart: 3128164352 data/hg19.fa : chr # 31 "chr17_ctg5_hap1" chrStart: 3132882944 data/hg19.fa : chr # 32 "chr4_ctg9_hap1" chrStart: 3134717952 data/hg19.fa : chr # 33 "chr1_gl000192_random" chrStart: 3135504384 data/hg19.fa : chr # 34 "chrUn_gl000225" chrStart: 3136290816 data/hg19.fa : chr # 35 "chr4_gl000194_random" chrStart: 3136552960 data/hg19.fa : chr # 36 "chr4_gl000193_random" chrStart: 3136815104 data/hg19.fa : chr # 37 "chr9_gl000200_random" chrStart: 3137077248 data/hg19.fa : chr # 38 "chrUn_gl000222" chrStart: 3137339392 data/hg19.fa : chr # 39 "chrUn_gl000212" chrStart: 3137601536 data/hg19.fa : chr # 40 "chr7_gl000195_random" chrStart: 3137863680 data/hg19.fa : chr # 41 "chrUn_gl000223" chrStart: 3138125824 data/hg19.fa : chr # 42 "chrUn_gl000224" chrStart: 3138387968 data/hg19.fa : chr # 43 "chrUn_gl000219" chrStart: 3138650112 data/hg19.fa : chr # 44 "chr17_gl000205_random" chrStart: 3138912256 data/hg19.fa : chr # 45 "chrUn_gl000215" chrStart: 3139174400 data/hg19.fa : chr # 46 "chrUn_gl000216" chrStart: 3139436544 data/hg19.fa : chr # 47 "chrUn_gl000217" chrStart: 3139698688 data/hg19.fa : chr # 48 "chr9_gl000199_random" chrStart: 3139960832 data/hg19.fa : chr # 49 "chrUn_gl000211" chrStart: 3140222976 data/hg19.fa : chr # 50 "chrUn_gl000213" chrStart: 3140485120 data/hg19.fa : chr # 51 "chrUn_gl000220" chrStart: 3140747264 data/hg19.fa : chr # 52 "chrUn_gl000218" chrStart: 3141009408 data/hg19.fa : chr # 53 "chr19_gl000209_random" chrStart: 3141271552 data/hg19.fa : chr # 54 "chrUn_gl000221" chrStart: 3141533696 data/hg19.fa : chr # 55 "chrUn_gl000214" chrStart: 3141795840 data/hg19.fa : chr # 56 "chrUn_gl000228" chrStart: 3142057984 data/hg19.fa : chr # 57 "chrUn_gl000227" chrStart: 3142320128 data/hg19.fa : chr # 58 "chr1_gl000191_random" chrStart: 3142582272 data/hg19.fa : chr # 59 "chr19_gl000208_random" chrStart: 3142844416 data/hg19.fa : chr # 60 "chr9_gl000198_random" chrStart: 3143106560 data/hg19.fa : chr # 61 "chr17_gl000204_random" chrStart: 3143368704 data/hg19.fa : chr # 62 "chrUn_gl000233" chrStart: 3143630848 data/hg19.fa : chr # 63 "chrUn_gl000237" chrStart: 3143892992 data/hg19.fa : chr # 64 "chrUn_gl000230" chrStart: 3144155136 data/hg19.fa : chr # 65 "chrUn_gl000242" chrStart: 3144417280 data/hg19.fa : chr # 66 "chrUn_gl000243" chrStart: 3144679424 data/hg19.fa : chr # 67 "chrUn_gl000241" chrStart: 3144941568 data/hg19.fa : chr # 68 "chrUn_gl000236" chrStart: 3145203712 data/hg19.fa : chr # 69 "chrUn_gl000240" chrStart: 3145465856 data/hg19.fa : chr # 70 "chr17_gl000206_random" chrStart: 3145728000 data/hg19.fa : chr # 71 "chrUn_gl000232" chrStart: 3145990144 data/hg19.fa : chr # 72 "chrUn_gl000234" chrStart: 3146252288 data/hg19.fa : chr # 73 "chr11_gl000202_random" chrStart: 3146514432 data/hg19.fa : chr # 74 "chrUn_gl000238" chrStart: 3146776576 data/hg19.fa : chr # 75 "chrUn_gl000244" chrStart: 3147038720 data/hg19.fa : chr # 76 "chrUn_gl000248" chrStart: 3147300864 data/hg19.fa : chr # 77 "chr8_gl000196_random" chrStart: 3147563008 data/hg19.fa : chr # 78 "chrUn_gl000249" chrStart: 3147825152 data/hg19.fa : chr # 79 "chrUn_gl000246" chrStart: 3148087296 data/hg19.fa : chr # 80 "chr17_gl000203_random" chrStart: 3148349440 data/hg19.fa : chr # 81 "chr8_gl000197_random" chrStart: 3148611584 data/hg19.fa : chr # 82 "chrUn_gl000245" chrStart: 3148873728 data/hg19.fa : chr # 83 "chrUn_gl000247" chrStart: 3149135872 data/hg19.fa : chr # 84 "chr9_gl000201_random" chrStart: 3149398016 data/hg19.fa : chr # 85 "chrUn_gl000235" chrStart: 3149660160 data/hg19.fa : chr # 86 "chrUn_gl000239" chrStart: 3149922304 data/hg19.fa : chr # 87 "chr21_gl000210_random" chrStart: 3150184448 data/hg19.fa : chr # 88 "chrUn_gl000231" chrStart: 3150446592 data/hg19.fa : chr # 89 "chrUn_gl000229" chrStart: 3150708736 data/hg19.fa : chr # 90 "chrM" chrStart: 3150970880 data/hg19.fa : chr # 91 "chrUn_gl000226" chrStart: 3151233024 data/hg19.fa : chr # 92 "chr18_gl000207_random" chrStart: 3151495168 Chromosome sequence lengths: chr1 249250621 chr2 243199373 chr3 198022430 chr4 191154276 chr5 180915260 chr6 171115067 chr7 159138663 chrX 155270560 chr8 146364022 chr9 141213431 chr10 135534747 chr11 135006516 chr12 133851895 chr13 115169878 chr14 107349540 chr15 102531392 chr16 90354753 chr17 81195210 chr18 78077248 chr20 63025520 chrY 59373566 chr19 59128983 chr22 51304566 chr21 48129895 chr6_ssto_hap7 4928567 chr6_mcf_hap5 4833398 chr6_cox_hap2 4795371 chr6_mann_hap4 4683263 chr6_apd_hap1 4622290 chr6_qbl_hap6 4611984 chr6_dbb_hap3 4610396 chr17_ctg5_hap1 1680828 chr4_ctg9_hap1 590426 chr1_gl000192_random 547496 chrUn_gl000225 211173 chr4_gl000194_random 191469 chr4_gl000193_random 189789 chr9_gl000200_random 187035 chrUn_gl000222 186861 chrUn_gl000212 186858 chr7_gl000195_random 182896 chrUn_gl000223 180455 chrUn_gl000224 179693 chrUn_gl000219 179198 chr17_gl000205_random 174588 chrUn_gl000215 172545 chrUn_gl000216 172294 chrUn_gl000217 172149 chr9_gl000199_random 169874 chrUn_gl000211 166566 chrUn_gl000213 164239 chrUn_gl000220 161802 chrUn_gl000218 161147 chr19_gl000209_random 159169 chrUn_gl000221 155397 chrUn_gl000214 137718 chrUn_gl000228 129120 chrUn_gl000227 128374 chr1_gl000191_random 106433 chr19_gl000208_random 92689 chr9_gl000198_random 90085 chr17_gl000204_random 81310 chrUn_gl000233 45941 chrUn_gl000237 45867 chrUn_gl000230 43691 chrUn_gl000242 43523 chrUn_gl000243 43341 chrUn_gl000241 42152 chrUn_gl000236 41934 chrUn_gl000240 41933 chr17_gl000206_random 41001 chrUn_gl000232 40652 chrUn_gl000234 40531 chr11_gl000202_random 40103 chrUn_gl000238 39939 chrUn_gl000244 39929 chrUn_gl000248 39786 chr8_gl000196_random 38914 chrUn_gl000249 38502 chrUn_gl000246 38154 chr17_gl000203_random 37498 chr8_gl000197_random 37175 chrUn_gl000245 36651 chrUn_gl000247 36422 chr9_gl000201_random 36148 chrUn_gl000235 34474 chrUn_gl000239 33824 chr21_gl000210_random 27682 chrUn_gl000231 27386 chrUn_gl000229 19913 chrM 16571 chrUn_gl000226 15008 chr18_gl000207_random 4262 Genome sequence total length = 3137161264 Genome size with padding = 3151757312 Aug 13 20:43:15 ..... processing annotations GTF Processing pGe.sjdbGTFfile=data/gencode.v46lift37.annotation.gtf, found: 255637 transcripts 1670004 exons (non-collapsed) 404138 collapsed junctions Total junctions: 404138 Aug 13 20:43:28 ..... finished GTF processing

Estimated genome size with padding and SJs: total=genome+SJ=3352757312 = 3151757312 + 201000000 GstrandBit=32 Number of SA indices: 5794620924 Aug 13 20:43:33 ... starting to sort Suffix Array. This may take a long time... Number of chunks: 73; chunks size limit: 740894552 bytes Aug 13 20:43:40 ... sorting Suffix Array chunks and saving them to disk... Writing 466746232 bytes into hg19STARIndex//SA_2 ; empty space on disk = 50858079289344 bytes ... done Writing 461648432 bytes into hg19STARIndex//SA_6 ; empty space on disk = 50738639142912 bytes ... done Writing 595183200 bytes into hg19STARIndex//SA_1 ; empty space on disk = 50620193046528 bytes ... done Writing 572756328 bytes into hg19STARIndex//SA_5 ; empty space on disk = 50467790913536 bytes ... done Writing 517496680 bytes into hg19STARIndex//SA_10 ; empty space on disk = 50321415995392 bytes ... done Writing 605076888 bytes into hg19STARIndex//SA_7 ; empty space on disk = 50188861308928 bytes ... done Writing 675278680 bytes into hg19STARIndex//SA_3 ; empty space on disk = 50034052694016 bytes ... done Writing 590904168 bytes into hg19STARIndex//SA_11 ; empty space on disk = 49861162434560 bytes ... done Writing 716911568 bytes into hg19STARIndex//SA_0 ; empty space on disk = 49709931560960 bytes ... done Writing 696470384 bytes into hg19STARIndex//SA_4 ; empty space on disk = 49526347923456 bytes ... done Writing 676186056 bytes into hg19STARIndex//SA_8 ; empty space on disk = 49348032331776 bytes ... done Writing 648482256 bytes into hg19STARIndex//SA_13 ; empty space on disk = 49174943891456 bytes ... done Writing 609658496 bytes into hg19STARIndex//SA_19 ; empty space on disk = 49008931241984 bytes ...Writing 656586688 bytes into hg19STARIndex//SA_9 ; empty space on disk = 48864835928064 bytes ... done done Writing 681549632 bytes into hg19STARIndex//SA_16 ; empty space on disk = 48684828983296 bytes ... done Writing 662260176 bytes into hg19STARIndex//SA_12 ; empty space on disk = 48510341742592 bytes ...Writing 641446856 bytes into hg19STARIndex//SA_14 ; empty space on disk = 48449895530496 bytes ... done done Writing 644733904 bytes into hg19STARIndex//SA_18 ; empty space on disk = 48177583489024 bytes ... done Writing 702842480 bytes into hg19STARIndex//SA_17 ; empty space on disk = 48012485197824 bytes ... done Writing 712560320 bytes into hg19STARIndex//SA_15 ; empty space on disk = 47832277975040 bytes ... done Writing 466747104 bytes into hg19STARIndex//SA_21 ; empty space on disk = 47651311583232 bytes ... done Writing 686621552 bytes into hg19STARIndex//SA_20 ; empty space on disk = 47531434180608 bytes ... done Writing 681472552 bytes into hg19STARIndex//SA_22 ; empty space on disk = 47355389804544 bytes ... done Writing 416707024 bytes into hg19STARIndex//SA_35 ; empty space on disk = 47180829163520 bytes ... done Writing 650726896 bytes into hg19STARIndex//SA_25 ; empty space on disk = 47073788428288 bytes ... done Writing 678142360 bytes into hg19STARIndex//SA_23 ; empty space on disk = 46908306358272 bytes ... done Writing 662138904 bytes into hg19STARIndex//SA_26 ; empty space on disk = 46734697824256 bytes ... done Writing 651339672 bytes into hg19STARIndex//SA_27 ; empty space on disk = 46564776083456 bytes ... done Writing 693551800 bytes into hg19STARIndex//SA_24 ; empty space on disk = 46398051450880 bytes ... done Writing 636081592 bytes into hg19STARIndex//SA_28 ; empty space on disk = 46219865882624 bytes ... done Writing 568527232 bytes into hg19STARIndex//SA_37 ; empty space on disk = 46057391128576 bytes ... done Writing 647954456 bytes into hg19STARIndex//SA_31 ; empty space on disk = 45911859265536 bytes ... done Writing 662393432 bytes into hg19STARIndex//SA_29 ; empty space on disk = 45745579229184 bytes ... done Writing 673458264 bytes into hg19STARIndex//SA_34 ; empty space on disk = 45576028684288 bytes ... done Writing 699635512 bytes into hg19STARIndex//SA_33 ; empty space on disk = 45403591409664 bytes ...Writing 680932232 bytes into hg19STARIndex//SA_36 ; empty space on disk = 45375816728576 bytes ... done done Writing 712539264 bytes into hg19STARIndex//SA_30 ; empty space on disk = 45050523287552 bytes ...Writing 688374008 bytes into hg19STARIndex//SA_32 ; empty space on disk = 44968669347840 bytes ... done done Writing 668635816 bytes into hg19STARIndex//SA_39 ; empty space on disk = 44691459407872 bytes ... done Writing 698202016 bytes into hg19STARIndex//SA_38 ; empty space on disk = 44520708243456 bytes ... done Writing 648239160 bytes into hg19STARIndex//SA_40 ; empty space on disk = 44340969734144 bytes ... done Writing 404365504 bytes into hg19STARIndex//SA_50 ; empty space on disk = 44174953938944 bytes ... done Writing 678843832 bytes into hg19STARIndex//SA_41 ; empty space on disk = 44071370358784 bytes ... done Writing 533296288 bytes into hg19STARIndex//SA_55 ; empty space on disk = 43897660112896 bytes ... done Writing 671001904 bytes into hg19STARIndex//SA_47 ; empty space on disk = 43761090428928 bytes ... done Writing 521098088 bytes into hg19STARIndex//SA_59 ; empty space on disk = 43589293834240 bytes ... done Writing 572293192 bytes into hg19STARIndex//SA_54 ; empty space on disk = 43455896092672 bytes ... done Writing 551170504 bytes into hg19STARIndex//SA_58 ; empty space on disk = 43309428899840 bytes ... done Writing 675171776 bytes into hg19STARIndex//SA_46 ; empty space on disk = 43168319930368 bytes ... done Writing 722177432 bytes into hg19STARIndex//SA_43 ; empty space on disk = 42995191644160 bytes ...Writing 617472896 bytes into hg19STARIndex//SA_48 ; empty space on disk = 42903394058240 bytes ... done done Writing 738777072 bytes into hg19STARIndex//SA_42 ; empty space on disk = 42652135325696 bytes ...Writing 619738224 bytes into hg19STARIndex//SA_53 ; empty space on disk = 42597628248064 bytes ... done done Writing 625234120 bytes into hg19STARIndex//SA_52 ; empty space on disk = 42304413892608 bytes ... done Writing 643543720 bytes into hg19STARIndex//SA_49 ; empty space on disk = 42144359251968 bytes ... done Writing 666755184 bytes into hg19STARIndex//SA_56 ; empty space on disk = 41979611185152 bytes ... done Writing 702227888 bytes into hg19STARIndex//SA_44 ; empty space on disk = 41808840097792 bytes ...Writing 687110576 bytes into hg19STARIndex//SA_51 ; empty space on disk = 41720306728960 bytes ... done done Writing 708037016 bytes into hg19STARIndex//SA_45 ; empty space on disk = 41453467205632 bytes ... done Writing 700123352 bytes into hg19STARIndex//SA_57 ; empty space on disk = 41273256837120 bytes ... done Writing 605645216 bytes into hg19STARIndex//SA_60 ; empty space on disk = 41094243942400 bytes ... done Writing 615466944 bytes into hg19STARIndex//SA_61 ; empty space on disk = 40939167940608 bytes ... done Writing 542314312 bytes into hg19STARIndex//SA_65 ; empty space on disk = 40781622542336 bytes ... done Writing 602064768 bytes into hg19STARIndex//SA_67 ; empty space on disk = 40642792128512 bytes ...Writing 638512816 bytes into hg19STARIndex//SA_70 ; empty space on disk = 40590887616512 bytes ... done done Writing 619273176 bytes into hg19STARIndex//SA_69 ; empty space on disk = 40325198381056 bytes ... done Writing 621442984 bytes into hg19STARIndex//SA_68 ; empty space on disk = 40165247549440 bytes ... done Writing 681641888 bytes into hg19STARIndex//SA_71 ; empty space on disk = 40005356486656 bytes ...Writing 651998568 bytes into hg19STARIndex//SA_66 ; empty space on disk = 39870698356736 bytes ... done done Writing 589880376 bytes into hg19STARIndex//SA_64 ; empty space on disk = 39663924412416 bytes ... done Writing 716913224 bytes into hg19STARIndex//SA_72 ; empty space on disk = 39512923176960 bytes ... done Writing 667827104 bytes into hg19STARIndex//SA_62 ; empty space on disk = 39328415744000 bytes ... done Writing 658347176 bytes into hg19STARIndex//SA_63 ; empty space on disk = 39157404532736 bytes ... done Aug 13 20:54:51 ... loading chunks from disk, packing SA... Aug 13 20:55:17 ... finished generating suffix array Aug 13 20:55:17 ... generating Suffix Array index Aug 13 20:57:57 ... completed Suffix Array index Aug 13 20:57:57 Finished preparing junctions Aug 13 20:57:57 ..... inserting junctions into the genome indices Aug 13 20:58:59 Finished SA search: number of new junctions=404095, old junctions=0 Aug 13 20:59:29 Finished sorting SA indicesL nInd=161637984 Genome size with junctions=3232980407 3151757312 81223095 GstrandBit1=32 GstrandBit=32 Aug 13 21:00:15 Finished inserting junction indices Aug 13 21:00:29 Finished SAi Aug 13 21:00:29 ..... finished inserting junctions into genome Aug 13 21:00:29 ... writing Genome to disk ... Writing 3232980407 bytes into hg19STARIndex//Genome ; empty space on disk = 50843591114752 bytes ... done SA size in bytes: 24569567999 Aug 13 21:00:30 ... writing Suffix Array to disk ... Writing 24569567999 bytes into hg19STARIndex//SA ; empty space on disk = 50015873531904 bytes ... done Aug 13 21:00:34 ... writing SAindex to disk Writing 8 bytes into hg19STARIndex//SAindex ; empty space on disk = 43721541287936 bytes ... done Writing 120 bytes into hg19STARIndex//SAindex ; empty space on disk = 43721541287936 bytes ... done Writing 1565873491 bytes into hg19STARIndex//SAindex ; empty space on disk = 43721541287936 bytes ... done Aug 13 21:00:35 ..... finished successfully DONE: Genome generation, EXITING