alexdobin / STAR

RNA-seq aligner
MIT License
1.85k stars 506 forks source link

EXITING because of FATAL GENOME INDEX FILE error: transcriptInfo.tab is corrupt, or is incompatible with the current STAR version SOLUTION: re-generate genome index #978

Open Kusimeena opened 4 years ago

Kusimeena commented 4 years ago

Hi, I am trying to align RNA-seq data using STAR version 2.7.5a using the following codes: STAR --genomeDir /Users/Home/Desktop/STAR_RNAseq/NCBI_GRCh39_index --readFilesIn 01.fastq.gz --runThreadN 2 --readFilesCommand gunzip -c --outFileNamePrefix 01A --quantMode TranscriptomeSAM GeneCounts

and it ended up with, XITING because of FATAL GENOME INDEX FILE error: transcriptInfo.tab is corrupt, or is incompatible with the current STAR version SOLUTION: re-generate genome index

I checked the Log.out and it ends like "finished successfully DONE: Genome generation, EXITING", so I believe there was no error with index generation. Could you help me how to fix this issue? Thanks.

alexdobin commented 4 years ago

Hi @Kusimeena

please regenerate the genome and check that your drive has not run out of space after genome generation, which could lead to the corruption of the output files. If this does not help, please send me the Log.out files for both the genome generation and mapping.

Cheers Alex

mtekman commented 4 years ago

Hi, I'm seeing this same error message.

Steps to reproduce

Load STAR environment

     conda create -n starnew star==2.7.5b
     conda activate starnew

(Edit: I can also confirm this occurs on version 2.7.5a also)

Generate STAR index

    STAR --runMode genomeGenerate \
         --genomeDir 'tempstargenomedir' \
         --readFilesCommand zcat \
         --genomeFastaFiles Homo_sapiens.GRCh38.dna.chromosome.21.fa \
         --sjdbOverhang 100 \
         --sjdbGTFfile Homo_sapiens.GRCh38.100.gtf \
         --genomeSAindexNbases 4 \
         --runThreadN 5

where I retrieved the hg38 chromosome 21 FASTA and the hg38 gtf file from:

This ran sucessfully:

  Aug 03 14:55:17 ..... started STAR run
  Aug 03 14:55:17 ... starting to generate Genome files
  Aug 03 14:55:18 ..... processing annotations GTF
  Aug 03 14:55:25 ... starting to sort Suffix Array. This may take a long time...
  Aug 03 14:55:25 ... sorting Suffix Array chunks and saving them to disk...
  Aug 03 14:55:46 ... loading chunks from disk, packing SA...
  Aug 03 14:55:47 ... finished generating suffix array
  Aug 03 14:55:47 ... generating Suffix Array index
  Aug 03 14:55:47 ... completed Suffix Array index
  Aug 03 14:55:47 ..... inserting junctions into the genome indices
  Aug 03 14:55:49 ... writing Genome to disk ...
  Aug 03 14:55:49 ... writing Suffix Array to disk ...
  Aug 03 14:55:51 ... writing SAindex to disk
  Aug 03 14:55:51 ..... finished successfully

Run STARsolo on test data 1K PBMC v2 data

The datasets I retrieved from here, and the barcodes file from here and attempted to map only the first lane L001:

    STAR  --runThreadN 4 \
          --genomeLoad NoSharedMemory \
          --genomeDir tempstargenomedir \
          --readFilesCommand zcat \
          --readFilesIn pbmc_1k_v2_fastqs/pbmc_1k_v2_S1_L001_R2_001.fastq.gz pbmc_1k_v2_fastqs/pbmc_1k_v2_S1_L001_R1_001.fastq.gz \
          --soloType Droplet \
          --soloCBwhitelist 737K-august-2016.txt \
          --soloBarcodeReadLength 1  \
          --soloCBstart 1 \
          --soloCBlen 16 \
          --soloUMIstart 17 \
          --soloUMIlen 10 \
          --soloStrand 'Forward' \
          --soloFeatures 'Gene' \
          --soloUMIdedup '1MM_All'

and the error message I recieve is:

Aug 03 14:56:52 ..... started STAR run
Aug 03 14:56:52 ..... loading genome

EXITING because of FATAL GENOME INDEX FILE error: transcriptInfo.tab is corrupt, or is incompatible with the current STAR version
SOLUTION: re-generate genome index
Aug 03 14:56:53 ...... FATAL ERROR, exiting

Attached are the transcriptInfo.tab file (renamed to .txt for uploading) in my tempstargenomedir and the Log.out file from the STARsolo run.

transcriptInfo.tab.txt Log.out.txt

mtekman commented 4 years ago

It looks like STAR does not like the fact that the GTF file specified more chromosomes than the FASTA

After constraining the GTF file just to chromosome 21, it seems to progress to the mapping stage

    cat Homo_sapiens.GRCh38.100.gtf | grep "^#!"  >  Homo_sapiens.GRCh38.100.chr21.gtf
    cat Homo_sapiens.GRCh38.100.gtf | grep "^21"  >> Homo_sapiens.GRCh38.100.chr21.gtf
Kusimeena commented 4 years ago

Hi alex,

I am sending you the log.out files for both genome generation and mapping. I believe the index generation went smoothly without any error. For the mapping, if i use "--quantMode GeneCounts" instead of "--quantMode TranscriptomeSAM GeneCounts", it runs successfully.

Thanking you for your kind support.

Regards

Meena Kusi

PhD candidate Integrated Biomedical Sciences (IBMS) Graduate Program

University of Texas Health Science Center at San Antonio 7703 Floyd Curl Drive San Antonio, TX 78229-3900 Email Address: kusim@livemail.uthscsa.eduhttps://bobcatmail.txstate.edu/owa/redir.aspx?C=xyhz86wHSxPyJxsxML9Gu0umC3uzf0JdaW9FP77j26fMqeIocsXTCA..&URL=mailto%3aamr125%40txstate.edu


From: Alexander Dobin notifications@github.com Sent: Monday, July 27, 2020 9:48 AM To: alexdobin/STAR STAR@noreply.github.com Cc: Kusimeena kusim@uthscsa.edu; Mention mention@noreply.github.com Subject: Re: [alexdobin/STAR] EXITING because of FATAL GENOME INDEX FILE error: transcriptInfo.tab is corrupt, or is incompatible with the current STAR version SOLUTION: re-generate genome index (#978)

Hi @Kusimeenahttps://github.com/Kusimeena

please regenerate the genome and check that your drive has not run out of space after genome generation, which could lead to the corruption of the output files. If this does not help, please send me the Log.out files for both the genome generation and mapping.

Cheers Alex

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/alexdobin/STAR/issues/978#issuecomment-664442118, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AQNEG2TZUBQFLCVOYBVNQUDR5WHTPANCNFSM4PIGZRAQ.

alexdobin commented 4 years ago

Hi Meena,

the files did not get attached - they cannot be attached in a reply, you would need to do it via GitHub site.

Cheers Alex

Kusimeena commented 4 years ago

Hi Alex,

Here are the files: 01_TestMappingLog.out.zip

IndexLog.out.zip

alexdobin commented 4 years ago

Hi Meena, Mehmet,

Mehmet is right - this issue occurs when the GTF files contains extra chromosome not present in the FASTA file. I will fix the issue shortly and release 2.7.5c. The bug was introduced in 2.7.5a - so for now you can fall back to 2.7.4a for genome generation. However, it's always better to sync your FASTA and GTF files.

Cheers Alex

alexdobin commented 4 years ago

Hi Meena, Mehmet,

This bug is fixed in 2.7.5c, please try it out. Thanks for reporting it!

Cheers Alex

davidrequena commented 3 years ago

Hello Alex, I found the error again in version 2.7.7a, could you please take a look? Thanks, David

alexdobin commented 3 years ago

Hi David,

I do not see this problem in my tests in 2.7.7a. Have you regenerated the genome with 2.7.7a? Please send me the first 2 lines from the transcriptInfo.tab file in the genome directory to check for this issue.

Cheers Alex

jcolinge commented 3 years ago

Dear Alex,

As I am experiencing similar problems. I was doing 2 pass alignment with intermediary genome files generation for pass 2 using the mouse genome from Ensembl (Mus_musculus.GRCm39.104.gtf and Mus_musculus.GRCm39.dna.primary_assembly.fa). After the first pass with STAR 2.7.1a I got this output while generating the intermediary reference genome:

EXITING because of FATAL error, the sjdb chromosome 20 is not found among the genomic chromosomes SOLUTION: fix your file(s) --sjdbFileChrStartEnd or --sjdbGTFfile, offending junction:20 234377 235269

Based on the discussion above, I compiled and used STAR 2.7.9a but I still get:

EXITING because of FATAL error, the sjdb chromosome 20 is not found among the genomic chromosomes SOLUTION: fix your file(s) --sjdbFileChrStartEnd or --sjdbGTFfile, offending junction:20 234377 235269

I do not know what to do at this stage. Thanks for your help.

Best, Jacques

alexdobin commented 3 years ago

Hi Jacques,

it looks like the file with splice junctions contains chromosomes that are not present in the genome. What are the STAR commands that you are using?

Cheers Alex

jcolinge commented 3 years ago

Dear Alexander,

Thank you for replying. There is a first bunch of “pass1” alignments each obtained with:

2021-09-05 23:37:46 executing /share/apps/STAR/bin/Linux_x86_64/STAR --genomeDir/share/apps/STAR/indexes/Mus_musculus --readFilesIn /data/jcolinge/LUAD/ICM-03-2021/Fastq/341-1CAF_R1.fastq /data/jcolinge/LUAD/ICM-03-2021/Fastq/341-1CAF_R2.fastq --runThreadN 7 --outFileNamePrefix /data/jcolinge/LUAD/ICM-03-2021/alignment/STAR/341-1CAF_align/pass1/

Then a project specific genome is generated with:

2021-09-06 13:06:47 failed /share/apps/STAR/bin/Linux_x86_64/STAR --runThreadN 10 --runMode genomeGenerate --limitSjdbInsertNsj 2500000 --genomeDir /data/jcolinge/LUAD/ICM-03-2021/alignment/STAR/project-genome/ --genomeFastaFiles /share/apps/STAR/indexes/Mus_musculus/Mus_musculus.GRCm39.dna.primary_assembly.fa --sjdbFileChrStartEnd /data/jcolinge/LUAD/ICM-03-2021/alignment/STAR/341-1CAF_align/pass1/SJ.out.tab /data/jcolinge/LUAD/ICM-03-2021/alignment/STAR/341-1CD31_align/pass1/SJ.out.tab /data/jcolinge/LUAD/ICM-03-2021/alignment/STAR/341-1CD45_align/pass1/SJ.out.tab --sjdbGTFfile /share/apps/STAR/indexes/Mus_musculus/Mus_musculus.GRCm39.104.gtf

Then pass2 alignment starts (though I am stuck at the genome generation for this project). I have use this pipeline on hundreds of files with different versions of STAR and different genomes. No problem so far (and thank you, we like STAR a lot!).

Best regards,

Jacques

De : Alexander Dobin @. Envoyé : jeudi, 7 octobre 2021 18:34 À : alexdobin/STAR @.> Cc : jcolinge @.>; Comment @.> Objet : Re: [alexdobin/STAR] EXITING because of FATAL GENOME INDEX FILE error: transcriptInfo.tab is corrupt, or is incompatible with the current STAR version SOLUTION: re-generate genome index (#978)

Hi Jacques,

it looks like the file with splice junctions contains chromosomes that are not present in the genome. What are the STAR commands that you are using?

Cheers Alex

— You are receiving this because you commented. Reply to this email directly, https://github.com/alexdobin/STAR/issues/978#issuecomment-937964010 view it on GitHub, or https://github.com/notifications/unsubscribe-auth/APWZSTE722B6DAO7IJKOLI3UFXDZDANCNFSM4PIGZRAQ unsubscribe. Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub .

alexdobin commented 3 years ago

Hi Jacques,

was the /share/apps/STAR/indexes/Mus_musculus genome index generated with the same --genomeFastaFiles /share/apps/STAR/indexes/Mus_musculus/Mus_musculus.GRCm39.dna.primary_assembly.fa and --sjdbGTFfile /share/apps/STAR/indexes/Mus_musculus/Mus_musculus.GRCm39.104.gtf files?

Please send me the output of the failed run.

Thanks! Alex