bacpop / ggCaller

Bifrost graph gene caller.
MIT License
87 stars 6 forks source link

Smaller core genome with less stringent identity cut-off #30

Closed conmeehan closed 5 months ago

conmeehan commented 5 months ago

Hi,

I am building a pangenome of 7 genomes and trying to see the impact of identity thresholds on the pangenome estimations. When I run ggCaller with the following command: ggcaller --refs input.txt --annotation ultrasensitive --aligner def --alignment core --save --out ggc_defaults --threads 10 --save --clean-mode strict --merge-paralogs

I get the following summary statistics: Core genes (99% <= strains <= 100%) 934 Soft core genes (95% <= strains < 99%) 0 Shell genes (15% <= strains < 95%) 1297 Cloud genes (0% <= strains < 15%) 56 Total genes (0% <= strains <= 100%) 2287

When I then run with a lower identity cut-off like so: ggcaller --refs input.txt --annotation ultrasensitive --aligner def --alignment core --save --out ggc_95 --threads 10 --save --clean-mode strict --merge-paralogs --identity-cutoff 95

I get this: Core genes (99% <= strains <= 100%) 0 Soft core genes (95% <= strains < 99%) 0 Shell genes (15% <= strains < 95%) 0 Cloud genes (0% <= strains < 15%) 3098 Total genes (0% <= strains <= 100%) 3098

I would expect that with a lower identity cut-off, I have a larger core genome and a smaller accessory but the opposite appears to be happening. The pangenome size also increases., which is not what I would expect from more clustering at a lower threshold. This also occurs at other cut-offs (also tried 80).

Any idea why this is occurring?

samhorsfield96 commented 5 months ago

Hi Conor, I'll have to take some time to look into this one. In the meantime, I would suggest playing with the --family-threshold flag instead. This is the threshold used by Panaroo to cluster gene families to generate the pangenome, and is set at 70% by default. --identity-cutoff is used by ggCaller for the initial pre-clustering, so isn't the final identity value used to cluster gene families.

conmeehan commented 5 months ago

Hi Sam,

I found what I did wrong. I put in 95 instead of .95 so it was not working correctly. Dont mind me!

Cheers, Conor

samhorsfield96 commented 5 months ago

Closed as completed.