Open babayagaofficial opened 4 weeks ago
Hi, could you provide the output files generated by ggCaller if possible, please? Also, does the gene that is missing map to the genome? The issue might be that ggCaller has called a gene across a contig break; if that's the case then it will be missing from the GFF as it cannot map to the genome and therefore won't have coordinates.
Yup, looks like you're right about it being a gene that was called over a contig break - minimapping it back gives me two matches, mapping the beginning of the gene to the end of the genome, and the end of the gene to the beginning of the genome. (That clears up my concerns, so do you still want the output files?)
Does a gene getting called across a contig break get recorded anywhere in the ggcaller output?
Hey!
I've been parsing ggcaller output to get locations of genes in each genome. To do this, I look in the GFF file for each genome, and find the entry with the right ID. If I understand correctly, I can get the internal ggcaller ID for a gene within a specific genome by looking at the corresponding genome column and gene row in gene_presence_absence_roary.csv, and then from that construct the GFF ID by taking the final number in the ggcaller ID, and appending it to the genome ID. If an entry is empty in gene_presence_absence_roary.csv, then that gene is not present in the given genome.
In an example I've been looking at, there is a ggcaller ID given in gene_presence_absence_roary.csv, but when I construct the GFF ID as I described above, there is no corresponding entry in the GFF file for that genome. The gene is also marked as being present in gene_presence_absence.Rtab file (in fact, it appears to be a core gene).