alexdobin / STAR

RNA-seq aligner
MIT License
1.83k stars 503 forks source link

No exon features causing error in index generation step #634

Closed Buuntu closed 5 years ago

Buuntu commented 5 years ago

I'm trying to use STAR to align RNA-seq data to a bacterial genome. Because the annotations were generated with Prokka (https://github.com/tseemann/prokka), there are no exons per say. Maybe the closest thing to an exon is CDS in bacteria, but I don't want it to leave out the ones that are annotated as rRNA or tRNA in my GTF. Maybe using STAR for a bacterial alignment is not even necessary since bacteria don't have introns and I can just be using something like bowtie2?

I know I can change the --sjdbGTFfeatureExon option, but will I be missing transcripts if I only use the CDS features?

This is the exact error I'm getting:

Fatal INPUT FILE error, no exon lines in the GTF file: output/genome.gtf
Solution: check the formatting of the GTF file, it must contain some lines with exon in the 3rd column.
          Make sure the GTF file is unzipped.
          If exons are marked with a different word, use --sjdbGTFfeatureExon .

May 02 17:21:38 ...... FATAL ERROR, exiting

And I confirmed that my GTF file only has CDS, rRNA, and tRNA features

alexdobin commented 5 years ago

Hi Buuntu,

if you have no splices (i.e. all transcript are single-exon), you do not need to use the annotations (GTF file) at all. Or, indeed, you can use a non-splice-aware aligner like bwa or bowtie. If you want to count reads per gene, you can rename all features in your GTF as "exon".

Cheers Alex

Buuntu commented 5 years ago

How would I add annotations and have meaningful gene names when I get to differential expression analysis without a GTF file? Otherwise the transcripts will not have any kind of meaningful name associated with them and just arbitrary names. My FASTA file doesn't have the gene names since it is not transcripts but a genome index.

I think bacteria do have some splices (group II introns) just not very many so maybe STAR would still have a better alignment?

alexdobin commented 5 years ago

Hi Gabriel,

how are you calculating gene expression? If you are using STAR's --quantMode GeneCounts option, you would need to provide the GTF file. In your GTF file, you need to replace the features in column 3 with "exon" and re-generate the genome.

Cheers Alex

Buuntu commented 5 years ago

I was going to use something like DeSeq2 to get the actual gene counts from the alignments. I haven't gotten to that step yet.

alexdobin commented 5 years ago

For Deseq2 you will need to generate the table with reads counts per gene.

chensena commented 1 year ago

Solution: check the formatting of the GTF file, it must contain some lines with exon in the 3rd column. Make sure the GTF file is unzipped. If exons are marked with a different word, use --sjdbGTFfeatureExon .