Closed surh closed 1 year ago
Hello,
It seems both genomes have been classified with ANI so there is no more genomes to run the pipeline.
[2023-07-13 18:36:45] INFO: Identifying markers in 0 genomes with 1 threads.
which causes GTDB-Tk to stop.
We will implement a better and more explicit exit of the program in the next release.
oh, I completely missed that. Thank you!
I also see that this will cause me problems, as I was planning to have this as a step of a nextflow pipeline, but if it exits with an error the pipeline is going to stop as well. Is there any way around this while we wait for the next release?
I can imagine that --skip_ani_screen
should force it to go through all the steps, though it would be unnecessarily slower.
Hello, So I actually already fixed this problem in the latest release of GTDB-Tk :) .
>> gtdbtk classify_wf --genome_dir genomes/ --out_dir genomes_taxonomy --mash_db mash_db/ --pplacer_cpus 8 --cpus 30
[2023-07-18 08:52:46] INFO: GTDB-Tk v2.3.2
[2023-07-18 08:52:46] INFO: gtdbtk classify_wf --genome_dir genomes/ --out_dir genomes_taxonomy --mash_db mash_db/ --pplacer_cpus 8 --cpus 30
[2023-07-18 08:52:46] INFO: Using GTDB-Tk reference data version r214: /srv/db/gtdbtk/official/release214
[2023-07-18 08:52:46] INFO: Loading reference genomes.
[2023-07-18 08:52:47] INFO: Using Mash version 2.3
[2023-07-18 08:52:48] INFO: Creating Mash sketch file: genomes_taxonomy/classify/ani_screen/intermediate_results/mash/gtdbtk.user_query_sketch.msh
[2023-07-18 08:52:48] INFO: Completed 2 genomes in 0.14 seconds (13.83 genomes/second).
[2023-07-18 08:52:48] INFO: Creating Mash sketch file: mash_db/gtdb_ref_sketch.msh
[2023-07-18 09:06:04] INFO: Completed 85,205 genomes in 13.28 minutes (6,418.24 genomes/minute).
[2023-07-18 09:06:04] INFO: Calculating Mash distances.
[2023-07-18 09:06:10] INFO: Calculating ANI with FastANI v1.32.
[2023-07-18 09:06:13] INFO: Completed 40 comparisons in 3.29 seconds (12.16 comparisons/second).
[2023-07-18 09:06:14] INFO: Summary of results saved to: genomes_taxonomy/classify/ani_screen/gtdbtk.bac120.ani_summary.tsv
[2023-07-18 09:06:14] INFO: 2 genome(s) have been classified using the ANI pre-screening step.
[2023-07-18 09:06:14] INFO: Done.
[2023-07-18 09:06:14] INFO: All genomes have been classified by the ANI screening step, Identify and Align steps will be skipped.
[2023-07-18 09:06:15] INFO: Note that Tk classification mode is insufficient for publication of new taxonomic designations. New designations should be based on one or more de novo trees, an example of which can be produced by Tk in de novo mode.
[2023-07-18 09:06:15] INFO: Done.
[2023-07-18 09:06:15] INFO: Removing intermediate files.
[2023-07-18 09:06:15] INFO: Intermediate files removed.
[2023-07-18 09:06:15] INFO: Done.
I'm trying to set up GTDBtk in our cluster. It halts after it tries to look for the
failed_genomes.tsv
file. Details are below, but the file doesn't exist.The error I get is
Environment
I'm using a conda environment created explicitly (via mamba) with:
Here is my environment list of packages
Server information
Server is CentOS Linux release 7.9.2009 (Core), and I requested 100GB of RAM to run this process.
Debugging information
Here is log output I get