bacpop / ggCaller

Bifrost graph gene caller.
MIT License
88 stars 6 forks source link

Regarding group_size_nuc in gene_presence_absence_roary.csv #39

Open FarahSaeed opened 2 months ago

FarahSaeed commented 2 months ago

Hi,

This is about group_size_nuc. For a specific gene, the gene_presence_absence_roary.csv file shows min_group_size_nuc= 489 and max_group_size_nuc= 762. From the gene_calls.ffn, the minimum number of characters for that gene are 498 and maximum are 775. In order to understand the sizes, I wanted to know if group_size_nuc represents the number of characters in the specific gene. Thanks a lot

samhorsfield96 commented 2 months ago

Hi, would you be able to provide examples of the files and genes you identified as being inconsistent, please? Many thanks.

FarahSaeed commented 2 months ago

Thanks for the reply. It was run on a small dataset of 13 genomes. The dataset along with roary file are here. It was run using the following command: ggcaller --refs input.txt This is the specific gene "sp-P33368-YOHF_ECOLI".

samhorsfield96 commented 2 months ago

Hi, thanks for providing these files. Would you also be able to provide the gene_calls.ffn if possible, please?

FarahSaeed commented 2 months ago

Thanks for the reply. Here is ffn file in the same folder.

samhorsfield96 commented 2 months ago

Hi, I've looked into the issue, it appears that the sp-P33368-YOHF_ECOLI orthologues you mentioned are within the correct size range specified in the gene_presence_absence_roary.csv file. I've attached a file containing the gene sequences of interest for reference. These genes have lengths of 762, 489, 762, 489, 762, 762, 762, 762, 762, 489, 762, 762, 762 respectively (min=489, max=762). Is this the correct gene you are interested in, or was there another than could be causing an issue? gene_lengths.txt

FarahSaeed commented 2 months ago

Thanks for the clarification. It is the correct gene. I was counting the newline character. The results are consistent now. Thanks.