bacpop / ggCaller

Bifrost graph gene caller.
MIT License
86 stars 6 forks source link

Documentation for custom annotation databases #5

Open dutchscientist opened 1 year ago

dutchscientist commented 1 year ago

Am trying this, first attempt with a personal training set looks good, although I personally like the way I can control annotation with Prokka, but this is more convenient. But straight away a question:

Is there any documentation of how an annotation database should look? I tried one I have used for Prokka (which uses ~~~ between text bits in the header), and that didn't work.

Prokka has an easy way to make databases from annotated genome sequences, any tips on how this for ggCaller? https://github.com/tseemann/prokka#adding-a-genus-databases https://github.com/tseemann/prokka#fasta-database-format

Thanks :)

samhorsfield96 commented 1 year ago

Hi, there's no strict requirement for annotation headings. The descriptions provided in the header of the annotation fasta file will be appended to respective clusters they align to. However, I'd suggest avoiding special character's like "~~~" as these are used to separate multiple annotations assigned to merged clusters in ggCaller. Thanks for bringing this up though - it's an important point and something we'll look into.

dutchscientist commented 1 year ago

Thanks. So probably best to search and replace the ~~~; the way Prokka makes the databases is nice and easy, so wouldn't like to lose it.