I annotated a bunch of viral genomes with pharokka and it looks that the sequences ids in the table and in the fasta header of the gff file are not the same. For instance:
In this case, AP017925.1 (first column in the gff table) is not equal to AP017925.1 Ralstonia phage RP31 DNA, complete genome (header in the fasta section of the gff file), which may cause 3rd party software to not being able to correctly read it. For comparison, the same genome annotated with prokka outputs:
##gff-version 3
##sequence-region AP017925.1 1 276958
AP017925.1 Prodigal:002006 CDS 30 452 . - 0 ID=AP017925_00001;inference=ab initio prediction:Prodigal:002006;locus_tag=AP017925_00001;product=hypothetical protein
AP017925.1 Prodigal:002006 CDS 501 2687 . - 0 ID=AP017925_00002;inference=ab initio prediction:Prodigal:002006;locus_tag=AP017925_00002;product=hypothetical protein
...
##FASTA
>AP017925.1
ACGAGAGAGGAGGCGAATGCCTCCTCTCTCTATGCCGCTATGGTAATGCGGCTGGGTACA
AAACCCTTTTCCACCAGAGATTTCAACGGCGGAAAGAGATTCTCAGGCAACTTATCCCAT
...
In this case, identifiers match so it's easy to parse.
Description
I annotated a bunch of viral genomes with pharokka and it looks that the sequences ids in the table and in the fasta header of the gff file are not the same. For instance:
In this case,
AP017925.1
(first column in the gff table) is not equal toAP017925.1 Ralstonia phage RP31 DNA, complete genome
(header in the fasta section of the gff file), which may cause 3rd party software to not being able to correctly read it. For comparison, the same genome annotated with prokka outputs:In this case, identifiers match so it's easy to parse.
PS. Thanks for this cool software!