Closed AnnaSyme closed 10 months ago
Hi @AnnaSyme, Are you referring to the marker database or a FASTA file with sequences that you can use to test the tool? If the later, the genome that is used in the quickstart guide is not enough for a quick test?
HI @apcamargo, Yes referring to the marker database. Thanks!
Ahh, ok! In this case you can use the --use-minimal-db
parameter of the genomad annotate
command. It will annotate the proteins with a subset of 42,098 markers. Just keep in mind that the classification performance will be below of what you would expect if you run geNomad with the full set of markers.
Because --use-minimal-db
is not exposed to the end-to-end
command, you'll have to run all the modules separately:
genomad annotate --use-minimal-db metagenome.fna genomad_output genomad_db
genomad find-proviruses metagenome.fna genomad_output genomad_db
genomad marker-classification metagenome.fna genomad_output genomad_db
genomad nn-classification metagenome.fna genomad_output
genomad aggregated-classification metagenome.fna genomad_output
# score-calibration is optional and not turned on by default in the end-to-end command
genomad score-calibration metagenome.fna genomad_output
genomad summary metagenome.fna genomad_output
Alternatively, you can just use reduce the search sensitivity (for instance, setting --sensitivity 1.4
) and then use the end-to-end
command to run the whole pipeline with the full set of markers. Again, you can expect the classification performance to take a hit.
If you just want to reduce the size of the database, you can do the following:
cd genomad_db
mmseqs createsubdb mini_set_ids genomad_db genomad_mini_db --subdb-mode 0
rm genomad_db
This will remove the full database file (1.4G) and replace it with a reduced version (348M). You will only be able to run geNomad with the --use-minimal-db
parameter if you do that, though.
Thanks so much @apcamargo, this will be really useful.
Sure thing! :)
I'll close this issue for now. Let me know if you have any problems.
I was wondering if you would know of a smaller database in the size of MB that could be used to test this tool?
Thanks if possible!