Closed sapuizait closed 7 months ago
From https://www.bacpop.org/poppunk/ 'For more detailed analyses, you may wish to download the all genomes database. If you wish to run either poppunk-visualise or any subclustering within strains this will require the full database.'
So for your purposes the reference only version will be find
You only need the 'full' database if you wish to run visualisations where you get an NJ tree of the whole database, or if you want further levels of subclustering.
Some more information is here: https://poppunk.readthedocs.io/en/latest/model_distribution.html
Hi there
Thank you for your wonderful software. It will be very useful in my line of work to identify the same strains from a given species using kmers! Also, the speed of the software is amazing! For most of the analyses, I can simply run it on my desktop PC!
However, could you please explain the difference between using the full dataset of all genomes or only the references. You mention that for more detailed analyses the full dataset is better, but what type of analyses? Is it in order to see if my genomes fall within clusters of previously identified genomes? or is the separation better? For example, for my purposes, I simply may wanna say if an isolate or a MAG are likely the same strain/serotype. Wouldnt the reference db be enough for that?
Thanks! p