apcamargo / genomad

geNomad: Identification of mobile genetic elements
https://portal.nersc.gov/genomad/
Other
189 stars 17 forks source link

How to organize the file structure of geNomad database #23

Closed shenwei356 closed 1 year ago

shenwei356 commented 1 year ago

Hi Antônio,

I really love this tool. It has really nice docs with beautiful charts and is effortless to use!

I downloaded the database from Zenodo and extracted them manually.

./: 32.66 GB
  22.89 GB      genomad_hmm_v1.3
   3.70 GB      genomad_msa_v1.3
   3.27 GB      genomad_hmm_v1.3.tar.gz
   1.37 GB      genomad_db
 810.10 MB      genomad_db_v1.3.tar.gz
 653.91 MB      genomad_msa_v1.3.tar.gz
   6.49 MB      genomad_metadata_v1.3.tsv.g

Then I tested with a genome from GTDB (GCA_000010645.1), which seemed to work as expected, successfully identifying the four plasmids in the file (only one when not using --relaxed).

genomad end-to-end --relaxed --cleanup --threads 40 GCA_000010645.1.fna.gz genomad ~/ws/db/genomad/genomad_db

I have one little question.

  1. Are other files except genomad_db needed? Files including genomad_hmm_v1.3 and genomad_msa_v1.3 are out of the genomad_db.

-- EDIT --

Hmm, I think the answer is no. It still works after moving other files to other paths.

apcamargo commented 1 year ago

Hi @shenwei356. Thanks for the comments!

Yes, you're right genomad_db is all you need to run geNomad. It contains the markers and integrases profiles databases, metadata, and taxdumps. genomad_hmm and genomad_msa are just additional databases that I make available so that people can use the markers outside of geNomad if they want.

I'll try to make that clearer in the next Zenodo release.