Closed conmeehan closed 5 months ago
Hi Conor, I'll have to take some time to look into this one. In the meantime, I would suggest playing with the --family-threshold
flag instead. This is the threshold used by Panaroo to cluster gene families to generate the pangenome, and is set at 70% by default. --identity-cutoff
is used by ggCaller for the initial pre-clustering, so isn't the final identity value used to cluster gene families.
Hi Sam,
I found what I did wrong. I put in 95 instead of .95 so it was not working correctly. Dont mind me!
Cheers, Conor
Closed as completed.
Hi,
I am building a pangenome of 7 genomes and trying to see the impact of identity thresholds on the pangenome estimations. When I run ggCaller with the following command: ggcaller --refs input.txt --annotation ultrasensitive --aligner def --alignment core --save --out ggc_defaults --threads 10 --save --clean-mode strict --merge-paralogs
I get the following summary statistics: Core genes (99% <= strains <= 100%) 934 Soft core genes (95% <= strains < 99%) 0 Shell genes (15% <= strains < 95%) 1297 Cloud genes (0% <= strains < 15%) 56 Total genes (0% <= strains <= 100%) 2287
When I then run with a lower identity cut-off like so: ggcaller --refs input.txt --annotation ultrasensitive --aligner def --alignment core --save --out ggc_95 --threads 10 --save --clean-mode strict --merge-paralogs --identity-cutoff 95
I get this: Core genes (99% <= strains <= 100%) 0 Soft core genes (95% <= strains < 99%) 0 Shell genes (15% <= strains < 95%) 0 Cloud genes (0% <= strains < 15%) 3098 Total genes (0% <= strains <= 100%) 3098
I would expect that with a lower identity cut-off, I have a larger core genome and a smaller accessory but the opposite appears to be happening. The pangenome size also increases., which is not what I would expect from more clustering at a lower threshold. This also occurs at other cut-offs (also tried 80).
Any idea why this is occurring?