Open SueFletcher opened 4 months ago
You need to use the PRI fasta file (genome sequences, not transcriptome): https://ftp.ebi.ac.uk/pub/databases/gencode/Gencode_mouse/release_M34/GRCm39.primary_assembly.genome.fa.gz You can also use the PRI GTF file which has more comprehensive annotations than the basic.
GRCm39.primary_assembly.genome.fa.gz does it this one ? : https://ftp.ebi.ac.uk/pub/databases/gencode/Gencode_mouse/release_M34/gencode.vM34.primary_assembly.annotation.gtf.gz
Yes, correct!
Hello, I'm a first-year master's student, and I'm attempting to use STAR to index the mouse genome. I'm using the following command:
import os import subprocess
class STAR: def init(self, genome_dir, genome_fasta_files, sjdb_gtf_file, runThreadN): self.exec_path = "/opt/conda/envs/STAR/bin/STAR" self.genome_dir = genome_dir self.genome_fasta_files = genome_fasta_files self.sjdb_gtf_file = sjdb_gtf_file self.runThreadN = runThreadN
genome_dir = "/desktop/output/mouse_genome_index/" genome_fasta_files = "/desktop/mouse_input_data/mouse_gencode_transcripts.fa" sjdb_gtf_file = "/desktop/mouse_input_data/mouse_gencode_annotation.gtf" runThreadN = 8
star = STAR(genome_dir, genome_fasta_files, sjdb_gtf_file, runThreadN) star.build_genome_index()
I downloaded the mouse genome FASTA and GTF files from the GENCODE website : https://www.gencodegenes.org/mouse/ I used the following GTF file
and this fasta file:![image](https://github.com/alexdobin/STAR/assets/154513461/6d3a8635-8378-4220-87be-0d557c29d0cf)
However, I encountered an error that I'm having trouble understanding: /opt/conda/envs/STAR/bin/STAR-avx2 --runMode genomeGenerate --runThreadN 8 --genomeChrBinNbits 12 --limitGenomeGenerateRAM 60000000000 --genomeDir /desktop/output/mouse_genome_index/ --genomeFastaFiles /desktop/mouse_input_data/mouse_gencode_transcripts.fa --sjdbGTFfile /desktop/mouse_input_data/mouse_gencode_annotation.gtf --genomeSAsparseD 3 STAR version: 2.7.11b compiled: 2024-01-29T15:15:38+0000 :/opt/conda/conda-bld/star_1706541070242/work/source Feb 29 15:51:23 ..... started STAR run Feb 29 15:51:23 ... starting to generate Genome files Feb 29 15:51:29 ..... processing annotations GTF
Fatal INPUT FILE error, no valid exon lines in the GTF file: /desktop/mouse_input_data/mouse_gencode_annotation.gtf Solution: check the formatting of the GTF file. One likely cause is the difference in chromosome naming between GTF and FASTA file.
Feb 29 15:51:32 ...... FATAL ERROR, exiting Traceback (most recent call last): File "mouse_star_index.py", line 39, in
star.build_genome_index()
File "mouse_star_index.py", line 31, in build_genome_index
subprocess.check_call(cmd)
File "/usr/lib/python3.8/subprocess.py", line 364, in check_call
raise CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command '['/opt/conda/envs/STAR/bin/STAR', '--runMode', 'genomeGenerate', '--runThreadN', '8', '--genomeChrBinNbits', '12', '--limitGenomeGenerateRAM', '60000000000', '--genomeDir', '/desktop/mouse_genome_index/', '--genomeFastaFiles', '/desktop/mouse_input_data/mouse_gencode_transcripts.fa', '--sjdbGTFfile', '/desktop/mouse_input_data/mouse_gencode_annotation.gtf', '--genomeSAsparseD', '3']' returned non-zero exit status 104.
it is related to the GTF file, but I don't know which GTF file I have to download from gencode in this case ( --sjdbGTFfile )