bacpop / ggCaller

Bifrost graph gene caller.
MIT License
86 stars 6 forks source link

First gene not annotated if the first base of the gene is the first base of the contig #19

Closed rgladstone closed 8 months ago

rgladstone commented 9 months ago

I ran ggcaller on 66 capsular loci extracted from references. The first base is the first base of the first gene, and the last base is the last base of the last gene.

ggcaller --refs db_fasta.txt --aligner ref --alignment pan --clean-mode sensitive --annotation sensitive --save --threads 32 --merge-paralogs --search-radius 30000 --max-orf-orf-distance 30000

For all 66 references the first gene is not annotated by ggcaller, even though it starts with the start codon 'atg' methionine. Prokka does annotate these first genes, and ggcaller has no issue when the last base of the stop codon is the last base of the contig. The results have one gene less than prokka + panaroo (using the same settings).

samhorsfield96 commented 8 months ago

Hi Rebecca, could you provide an example of one of these genes that ggCaller is missing?

rgladstone commented 8 months ago

Thanks, here are the two annotations, one for Prokka that also contains the sequence and one from ggcaller. I've added a txt extension as github wouldn't accept .gff. It's the first gene 1-777 that isn't being captured by ggcaller.

K1_CP000243_ggcaller.gff.txt K1_CP000243_prokka.gff.txt

samhorsfield96 commented 8 months ago

The missing gene sequence may have been truncated or elongated incorrectly by ggCaller. Would you be able to send across the full ggCaller output folder, please? I can take a look in more detail.

rgladstone commented 8 months ago

Sure, I've zipped it up here

samhorsfield96 commented 8 months ago

Hi Rebecca, I've now implemented a change that should sort the issue in commit bf39e1c.

samhorsfield96 commented 8 months ago

Closed as inactive.