Open Kusimeena opened 4 years ago
Hi @Kusimeena
please regenerate the genome and check that your drive has not run out of space after genome generation, which could lead to the corruption of the output files. If this does not help, please send me the Log.out files for both the genome generation and mapping.
Cheers Alex
Hi, I'm seeing this same error message.
conda create -n starnew star==2.7.5b
conda activate starnew
(Edit: I can also confirm this occurs on version 2.7.5a
also)
STAR --runMode genomeGenerate \
--genomeDir 'tempstargenomedir' \
--readFilesCommand zcat \
--genomeFastaFiles Homo_sapiens.GRCh38.dna.chromosome.21.fa \
--sjdbOverhang 100 \
--sjdbGTFfile Homo_sapiens.GRCh38.100.gtf \
--genomeSAindexNbases 4 \
--runThreadN 5
where I retrieved the hg38 chromosome 21 FASTA and the hg38 gtf file from:
This ran sucessfully:
Aug 03 14:55:17 ..... started STAR run
Aug 03 14:55:17 ... starting to generate Genome files
Aug 03 14:55:18 ..... processing annotations GTF
Aug 03 14:55:25 ... starting to sort Suffix Array. This may take a long time...
Aug 03 14:55:25 ... sorting Suffix Array chunks and saving them to disk...
Aug 03 14:55:46 ... loading chunks from disk, packing SA...
Aug 03 14:55:47 ... finished generating suffix array
Aug 03 14:55:47 ... generating Suffix Array index
Aug 03 14:55:47 ... completed Suffix Array index
Aug 03 14:55:47 ..... inserting junctions into the genome indices
Aug 03 14:55:49 ... writing Genome to disk ...
Aug 03 14:55:49 ... writing Suffix Array to disk ...
Aug 03 14:55:51 ... writing SAindex to disk
Aug 03 14:55:51 ..... finished successfully
The datasets I retrieved from here, and the barcodes file from here and attempted to map only the first lane L001
:
STAR --runThreadN 4 \
--genomeLoad NoSharedMemory \
--genomeDir tempstargenomedir \
--readFilesCommand zcat \
--readFilesIn pbmc_1k_v2_fastqs/pbmc_1k_v2_S1_L001_R2_001.fastq.gz pbmc_1k_v2_fastqs/pbmc_1k_v2_S1_L001_R1_001.fastq.gz \
--soloType Droplet \
--soloCBwhitelist 737K-august-2016.txt \
--soloBarcodeReadLength 1 \
--soloCBstart 1 \
--soloCBlen 16 \
--soloUMIstart 17 \
--soloUMIlen 10 \
--soloStrand 'Forward' \
--soloFeatures 'Gene' \
--soloUMIdedup '1MM_All'
and the error message I recieve is:
Aug 03 14:56:52 ..... started STAR run
Aug 03 14:56:52 ..... loading genome
EXITING because of FATAL GENOME INDEX FILE error: transcriptInfo.tab is corrupt, or is incompatible with the current STAR version
SOLUTION: re-generate genome index
Aug 03 14:56:53 ...... FATAL ERROR, exiting
Attached are the transcriptInfo.tab
file (renamed to .txt for uploading) in my tempstargenomedir
and the Log.out
file from the STARsolo run.
It looks like STAR does not like the fact that the GTF file specified more chromosomes than the FASTA
After constraining the GTF file just to chromosome 21, it seems to progress to the mapping stage
cat Homo_sapiens.GRCh38.100.gtf | grep "^#!" > Homo_sapiens.GRCh38.100.chr21.gtf
cat Homo_sapiens.GRCh38.100.gtf | grep "^21" >> Homo_sapiens.GRCh38.100.chr21.gtf
Hi alex,
I am sending you the log.out files for both genome generation and mapping. I believe the index generation went smoothly without any error. For the mapping, if i use "--quantMode GeneCounts" instead of "--quantMode TranscriptomeSAM GeneCounts", it runs successfully.
Thanking you for your kind support.
Regards
Meena Kusi
PhD candidate Integrated Biomedical Sciences (IBMS) Graduate Program
University of Texas Health Science Center at San Antonio 7703 Floyd Curl Drive San Antonio, TX 78229-3900 Email Address: kusim@livemail.uthscsa.eduhttps://bobcatmail.txstate.edu/owa/redir.aspx?C=xyhz86wHSxPyJxsxML9Gu0umC3uzf0JdaW9FP77j26fMqeIocsXTCA..&URL=mailto%3aamr125%40txstate.edu
From: Alexander Dobin notifications@github.com Sent: Monday, July 27, 2020 9:48 AM To: alexdobin/STAR STAR@noreply.github.com Cc: Kusimeena kusim@uthscsa.edu; Mention mention@noreply.github.com Subject: Re: [alexdobin/STAR] EXITING because of FATAL GENOME INDEX FILE error: transcriptInfo.tab is corrupt, or is incompatible with the current STAR version SOLUTION: re-generate genome index (#978)
Hi @Kusimeenahttps://github.com/Kusimeena
please regenerate the genome and check that your drive has not run out of space after genome generation, which could lead to the corruption of the output files. If this does not help, please send me the Log.out files for both the genome generation and mapping.
Cheers Alex
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/alexdobin/STAR/issues/978#issuecomment-664442118, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AQNEG2TZUBQFLCVOYBVNQUDR5WHTPANCNFSM4PIGZRAQ.
Hi Meena,
the files did not get attached - they cannot be attached in a reply, you would need to do it via GitHub site.
Cheers Alex
Hi Meena, Mehmet,
Mehmet is right - this issue occurs when the GTF files contains extra chromosome not present in the FASTA file. I will fix the issue shortly and release 2.7.5c. The bug was introduced in 2.7.5a - so for now you can fall back to 2.7.4a for genome generation. However, it's always better to sync your FASTA and GTF files.
Cheers Alex
Hi Meena, Mehmet,
This bug is fixed in 2.7.5c, please try it out. Thanks for reporting it!
Cheers Alex
Hello Alex, I found the error again in version 2.7.7a, could you please take a look? Thanks, David
Hi David,
I do not see this problem in my tests in 2.7.7a. Have you regenerated the genome with 2.7.7a? Please send me the first 2 lines from the transcriptInfo.tab file in the genome directory to check for this issue.
Cheers Alex
Dear Alex,
As I am experiencing similar problems. I was doing 2 pass alignment with intermediary genome files generation for pass 2 using the mouse genome from Ensembl (Mus_musculus.GRCm39.104.gtf and Mus_musculus.GRCm39.dna.primary_assembly.fa). After the first pass with STAR 2.7.1a I got this output while generating the intermediary reference genome:
EXITING because of FATAL error, the sjdb chromosome 20 is not found among the genomic chromosomes SOLUTION: fix your file(s) --sjdbFileChrStartEnd or --sjdbGTFfile, offending junction:20 234377 235269
Based on the discussion above, I compiled and used STAR 2.7.9a but I still get:
EXITING because of FATAL error, the sjdb chromosome 20 is not found among the genomic chromosomes SOLUTION: fix your file(s) --sjdbFileChrStartEnd or --sjdbGTFfile, offending junction:20 234377 235269
I do not know what to do at this stage. Thanks for your help.
Best, Jacques
Hi Jacques,
it looks like the file with splice junctions contains chromosomes that are not present in the genome. What are the STAR commands that you are using?
Cheers Alex
Dear Alexander,
Thank you for replying. There is a first bunch of “pass1” alignments each obtained with:
2021-09-05 23:37:46 executing /share/apps/STAR/bin/Linux_x86_64/STAR --genomeDir/share/apps/STAR/indexes/Mus_musculus --readFilesIn /data/jcolinge/LUAD/ICM-03-2021/Fastq/341-1CAF_R1.fastq /data/jcolinge/LUAD/ICM-03-2021/Fastq/341-1CAF_R2.fastq --runThreadN 7 --outFileNamePrefix /data/jcolinge/LUAD/ICM-03-2021/alignment/STAR/341-1CAF_align/pass1/
Then a project specific genome is generated with:
2021-09-06 13:06:47 failed /share/apps/STAR/bin/Linux_x86_64/STAR --runThreadN 10 --runMode genomeGenerate --limitSjdbInsertNsj 2500000 --genomeDir /data/jcolinge/LUAD/ICM-03-2021/alignment/STAR/project-genome/ --genomeFastaFiles /share/apps/STAR/indexes/Mus_musculus/Mus_musculus.GRCm39.dna.primary_assembly.fa --sjdbFileChrStartEnd /data/jcolinge/LUAD/ICM-03-2021/alignment/STAR/341-1CAF_align/pass1/SJ.out.tab /data/jcolinge/LUAD/ICM-03-2021/alignment/STAR/341-1CD31_align/pass1/SJ.out.tab /data/jcolinge/LUAD/ICM-03-2021/alignment/STAR/341-1CD45_align/pass1/SJ.out.tab
Then pass2 alignment starts (though I am stuck at the genome generation for this project). I have use this pipeline on hundreds of files with different versions of STAR and different genomes. No problem so far (and thank you, we like STAR a lot!).
Best regards,
Jacques
De : Alexander Dobin @. Envoyé : jeudi, 7 octobre 2021 18:34 À : alexdobin/STAR @.> Cc : jcolinge @.>; Comment @.> Objet : Re: [alexdobin/STAR] EXITING because of FATAL GENOME INDEX FILE error: transcriptInfo.tab is corrupt, or is incompatible with the current STAR version SOLUTION: re-generate genome index (#978)
Hi Jacques,
it looks like the file with splice junctions contains chromosomes that are not present in the genome. What are the STAR commands that you are using?
Cheers Alex
— You are receiving this because you commented. Reply to this email directly, https://github.com/alexdobin/STAR/issues/978#issuecomment-937964010 view it on GitHub, or https://github.com/notifications/unsubscribe-auth/APWZSTE722B6DAO7IJKOLI3UFXDZDANCNFSM4PIGZRAQ unsubscribe. Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub .
Hi Jacques,
was the /share/apps/STAR/indexes/Mus_musculus genome index generated with the same
--genomeFastaFiles /share/apps/STAR/indexes/Mus_musculus/Mus_musculus.GRCm39.dna.primary_assembly.fa
and
--sjdbGTFfile /share/apps/STAR/indexes/Mus_musculus/Mus_musculus.GRCm39.104.gtf
files?
Please send me the output of the failed run.
Thanks! Alex
Hi, I am trying to align RNA-seq data using STAR version 2.7.5a using the following codes: STAR --genomeDir /Users/Home/Desktop/STAR_RNAseq/NCBI_GRCh39_index --readFilesIn 01.fastq.gz --runThreadN 2 --readFilesCommand gunzip -c --outFileNamePrefix 01A --quantMode TranscriptomeSAM GeneCounts
and it ended up with, XITING because of FATAL GENOME INDEX FILE error: transcriptInfo.tab is corrupt, or is incompatible with the current STAR version SOLUTION: re-generate genome index
I checked the Log.out and it ends like "finished successfully DONE: Genome generation, EXITING", so I believe there was no error with index generation. Could you help me how to fix this issue? Thanks.