Open FarahSaeed opened 2 months ago
Hi, would you be able to provide examples of the files and genes you identified as being inconsistent, please? Many thanks.
Thanks for the reply. It was run on a small dataset of 13 genomes. The dataset along with roary file are here. It was run using the following command:
ggcaller --refs input.txt
This is the specific gene "sp-P33368-YOHF_ECOLI".
Hi, thanks for providing these files. Would you also be able to provide the gene_calls.ffn if possible, please?
Thanks for the reply. Here is ffn file in the same folder.
Hi, I've looked into the issue, it appears that the sp-P33368-YOHF_ECOLI orthologues you mentioned are within the correct size range specified in the gene_presence_absence_roary.csv
file. I've attached a file containing the gene sequences of interest for reference. These genes have lengths of 762, 489, 762, 489, 762, 762, 762, 762, 762, 489, 762, 762, 762 respectively (min=489, max=762). Is this the correct gene you are interested in, or was there another than could be causing an issue?
gene_lengths.txt
Thanks for the clarification. It is the correct gene. I was counting the newline character. The results are consistent now. Thanks.
Hi,
This is about group_size_nuc. For a specific gene, the gene_presence_absence_roary.csv file shows min_group_size_nuc= 489 and max_group_size_nuc= 762. From the gene_calls.ffn, the minimum number of characters for that gene are 498 and maximum are 775. In order to understand the sizes, I wanted to know if group_size_nuc represents the number of characters in the specific gene. Thanks a lot