Closed wangpeng407 closed 4 years ago
Currently the OPERA-MS genome database is based on an old version of the NCBI complete genomes that only contained 2,800 genomes. We will increase the number of species present in the OPERA-MS database in our next release. The updated database will contain genomes from > 20,000 species, and we will provide a script that will allow users to easily generate a custom reference genome database.
Best regards,
--- Denis
The step5 of OPERA-MS is "computation of Mash genomic distance against a database of 2,800 complete genomes".
In NCBI, there are 16,835 complete genomes. Even if remove the virus, animals, human or other non-bacteria, there are still >2800 complete bacteria genome.
So my question is why authors chose those 2800 genomes as reference, not all NCBI complete genomes?
Thanks ~