franciscozorrilla / metaGEM

:gem: An easy-to-use workflow for generating context specific genome-scale metabolic models and predicting metabolic interactions within microbial communities directly from metagenomic data
https://franciscozorrilla.github.io/metaGEM/
MIT License
203 stars 42 forks source link

ERROR when using GTDBTK #158

Closed pwzation closed 8 months ago

pwzation commented 8 months ago

Hello! Now I'm trying to use the metagem but I get stuck on the GTDBTK part for a long while. For the GTDBTK, I have downloaded the latest version of reference data #214. Howeever, when I run the check code: ' gtdbtk check_install ' it gives me a report like this: ' [2024-03-06 10:32:11] INFO: Running install verification [2024-03-06 10:32:11] INFO: Checking that all third-party software are on the system path: [2024-03-06 10:32:11] INFO: |-- FastTree OK [2024-03-06 10:32:11] INFO: |-- FastTreeMP OK [2024-03-06 10:32:11] INFO: |-- fastANI OK [2024-03-06 10:32:11] INFO: |-- guppy OK [2024-03-06 10:32:11] INFO: |-- hmmalign OK [2024-03-06 10:32:11] INFO: |-- hmmsearch OK [2024-03-06 10:32:11] INFO: |-- mash OK [2024-03-06 10:32:11] INFO: |-- pplacer OK [2024-03-06 10:32:11] INFO: |-- prodigal OK [2024-03-06 10:32:11] INFO: Checking integrity of reference package: /public/home/zhongyang/metaGEM/workflow/envs/metagem/share/gtdbtk-1.7.0/db/release214 [2024-03-06 10:32:13] INFO: |-- pplacer HASH MISMATCH 6786e9fc16b31db7d6eaaa9f8cfa87a8a49744340.0%) [2024-03-06 10:32:13] INFO: |-- masks HASH MISMATCH 8d5a2139feabbb70789c62155f3761d2aeed1601%) [2024-03-06 10:32:22] INFO: |-- markers HASH MISMATCH 163f542c3f0a40f59df45d453aa235b39aa96e27 (99.45%) [2024-03-06 10:32:22] INFO: |-- radii HASH MISMATCH 4753acc920001a1400788ee89cb4632900449055) [2024-03-06 10:32:32] INFO: |-- msa HASH MISMATCH 75df495678a121497e14346b453caf42f4b03922 [2024-03-06 10:32:32] INFO: |-- metadata HASH MISMATCH a089cc36bf79a40c7506019accc5f93e940d9fed.0%) [2024-03-06 10:32:32] INFO: |-- taxonomy HASH MISMATCH 89b12cf8106f326887599dcb30ef94ebba1420356.67%) ' Some HASH MISMATCH occured. I try to ignore these mismatch and running the gtdbtk programe in the metagem: ' gtdbtk classify_wf --genome_dir my/file/path --out_dir GTDBTk -x fa --cpus 24' But it cannot work because some file of reference data is missing: '[Errno 2] No such file or directory: '/public/gtdbtk-1.7.0/db/release214/markers/pfam/individual_hmms/PF01868.17.hmm' Does this error is derived from the difference between the gtdbtk-1.7 and the reference data?

pwzation commented 8 months ago

I found the database must match the version of the gtdbk. The gtdbtk 1.7 can only choose the 202 database. It resolves my problem. However, I just try to upgrade my gtdbtk to the latest version. During this progress, too many Dependencies are missing. Hope the metagem can have a function to update the version of each software.

franciscozorrilla commented 8 months ago

Hi @pwzation , thank you for reporting this issue and glad that you found a resolution. I am currently on vacation, but will have a look at this more closely when I am back.

Best, Francisco