I have a question associate with count reads for each gene using STAR. I download cerevisiae genome fasta and gtf files from UCSC. Since I have some interested ncRNA are not in the official gtf file (, I added my interested ncRNA coordinates information into gtf file. When I used official gtf file to generate the genome index, it works, then I continued with genome alignment to get the number of reads of each genes using my modified gtf file. I found I only got the read counts for the genes in the official gtf file instead of my interested ncRNA added in the gtf file. Here are following of my codes for genome index and mapping.
STAR only uses column3=exon lines from the GTF file, which have to have transcript_id and gene_id fields.
So if you replace CDS with exon in your added lines, they should work.
Dear Alexdobin,
I have a question associate with count reads for each gene using STAR. I download cerevisiae genome fasta and gtf files from UCSC. Since I have some interested ncRNA are not in the official gtf file (, I added my interested ncRNA coordinates information into gtf file. When I used official gtf file to generate the genome index, it works, then I continued with genome alignment to get the number of reads of each genes using my modified gtf file. I found I only got the read counts for the genes in the official gtf file instead of my interested ncRNA added in the gtf file. Here are following of my codes for genome index and mapping.
For generating genome index:
module load ngs/STAR/2.7.1a
STAR --runMode genomeGenerate \ --genomeDir UCSC_SacCer3_index_STAR \ --runThreadN 17 \ --genomeFastaFiles UCSC_SacCer3_fasta/*.fa \ --sjdbGTFfile sacCer3.ensGene.gtf \ --sjdbOverhang 99
For mapping using my modified gtf files:
module load ngs/STAR/2.7.1a cd Fastq_raw/ ls BY4741_A*.gz | cut -d "." -f 1| while read id; do STAR --genomeDir ../UCSC_SacCer3_index_STAR/ \ --runThreadN 5 \ --readFilesIn ${id}.gz --readFilesCommand zcat \ --quantMode TranscriptomeSAM GeneCounts \ --outSAMtype BAM SortedByCoordinate \ --outFileNamePrefix ../bam_star/${id}_sorted.bam \ --sjdbOverhang 99 \ --sjdbGTFfile ../gtf_UCSC_ensGenes_combined_ncRNAs_new.gtf done
for the format of official gtf file I download from UCSC:
chrIV ensGene.v101 transcript 1802 2953 . + . gene_id "YDL248W"; transcript_id "YDL248W_mRNA"; gene_name "YDL248W"; chrIV ensGene.v101 exon 1802 2953 . + . gene_id "YDL248W"; transcript_id "YDL248W_mRNA"; exon_number "1"; exon_id "YDL248W_mRNA.1"; gene_name "YDL248W"; chrIV ensGene.v101 CDS 1802 2950 . + 0 gene_id "YDL248W"; transcript_id "YDL248W_mRNA"; exon_number "1"; exon_id "YDL248W_mRNA.1"; gene_name "YDL248W"; chrIV ensGene.v101 start_codon 1802 1804 . + 0 gene_id "YDL248W"; transcript_id "YDL248W_mRNA"; exon_number "1"; exon_id "YDL248W_mRNA.1"; gene_name "YDL248W"; chrIV ensGene.v101 stop_codon 2951 2953 . + 0 gene_id "YDL248W"; transcript_id "YDL248W_mRNA"; exon_number "1"; exon_id "YDL248W_mRNA.1"; gene_name "YDL248W"; chrIV ensGene.v101 transcript 3762 3836 . + . gene_id "YDL247W-A"; transcript_id "YDL247W-A_mRNA"; gene_name "YDL247W-A"; chrIV ensGene.v101 exon 3762 3836 . + . gene_id "YDL247W-A"; transcript_id "YDL247W-A_mRNA"; exon_number "1"; exon_id "YDL247W-A_mRNA.1"; gene_name "YDL247W-A"; chrIV ensGene.v101 CDS 3762 3833 . + 0 gene_id "YDL247W-A"; transcript_id "YDL247W-A_mRNA"; exon_number "1"; exon_id "YDL247W-A_mRNA.1"; gene_name "YDL247W-A"; chrIV ensGene.v101 start_codon 3762 3764 . + 0 gene_id "YDL247W-A"; transcript_id "YDL247W-A_mRNA"; exon_number "1"; exon_id "YDL247W-A_mRNA.1"; gene_name "YDL247W-A"; chrIV ensGene.v101 stop_codon 3834 3836 . + 0 gene_id "YDL247W-A"; transcript_id "YDL247W-A_mRNA"; exon_number "1"; exon_id "YDL247W-A_mRNA.1"; gene_name "YDL247W-A";
For the gtf file after adding the new ncRNA:
chrI rtracklayer CDS 5074 6237 . - . gene_id "SUT432"; transcript_id "SUT432_mRNA"; gene_name "SUT432"; exon_number "1"; exon_id "SUT432_mRNA.1" chrI rtracklayer CDS 9367 9600 . + . gene_id "SUT001"; transcript_id "SUT001_mRNA"; gene_name "SUT001"; exon_number "1"; exon_id "SUT001_mRNA.1" chrI rtracklayer CDS 10731 11140 . - . gene_id "CUT436"; transcript_id "CUT436_mRNA"; gene_name "CUT436"; exon_number "1"; exon_id "CUT436_mRNA.1" chrI rtracklayer CDS 28082 29772 . - . gene_id "SUT433"; transcript_id "SUT433_mRNA"; gene_name "SUT433"; exon_number "1"; exon_id "SUT433_mRNA.1" chrI rtracklayer CDS 30071 30904 . + . gene_id "CUT001"; transcript_id "CUT001_mRNA"; gene_name "CUT001"; exon_number "1"; exon_id "CUT001_mRNA.1" chrI rtracklayer CDS 30531 30892 . - . gene_id "CUT437"; transcript_id "CUT437_mRNA"; gene_name "CUT437"; exon_number "1"; exon_id "CUT437_mRNA.1" chrI rtracklayer CDS 31483 32748 . - . gene_id "SUT434"; transcript_id "SUT434_mRNA"; gene_name "SUT434"; exon_number "1"; exon_id "SUT434_mRNA.1" chrI rtracklayer CDS 33075 34380 . - . gene_id "SUT435"; transcript_id "SUT435_mRNA"; gene_name "SUT435"; exon_number "1"; exon_id "SUT435_mRNA.1" chrI rtracklayer CDS 34379 34748 . - . gene_id "CUT438"; transcript_id "CUT438_mRNA"; gene_name "CUT438"; exon_number "1"; exon_id "CUT438_mRNA.1"
I would appreciate it if you could give me some suggestions on that! Looking forward to your reply!
Many thanks and best regards, Lingling