barricklab / breseq

breseq is a computational pipeline for finding mutations relative to a reference sequence in short-read DNA resequencing data. It is intended for haploid microbial genomes (<20 Mb). breseq is a command line tool implemented in C++ and R.
http://barricklab.org/breseq
GNU General Public License v2.0
143 stars 21 forks source link

annotation was blank #352

Closed kissboyxiao closed 1 year ago

kissboyxiao commented 1 year ago

Dear developer, I tried running breseq commands on Hiseq data from both E. coli and Serratia marcescens, respectively, but, in both cases, while mutations were returned properly, "annotation" were all "intergenic (-/-)" and "description" were all blank. What should I do to make annotation work?

Thanks!

jeffreybarrick commented 1 year ago

Does your reference file have annotations of where genes are in it?

If you are using a FASTA file, then it doesn't. If the reference genome sequence is available from Genbank, download the Genbank version with annotations and sequence both included and use that with breseq. This page has some guidance for downloading a reference file.

If these are reference genomes that you assembled yourself, and you only have a FASTA for that reason. You can use a tool like PROKKA to create an annotated Genbank file.