it also hang on during "Identifying TIGRFAM protein families." gtdbtk-2.3.2

Ecogenomics / GTDBTk

GTDB-Tk: a toolkit for assigning objective taxonomic classifications to bacterial and archaeal genomes.

https://ecogenomics.github.io/GTDBTk/

GNU General Public License v3.0

476 stars 82 forks source link

it also hang on during "Identifying TIGRFAM protein families." gtdbtk-2.3.2 #608

Open Caiyulu-818 opened 1 week ago

Caiyulu-818 commented 1 week ago

Hi， When i use the gtdbtk-2.1.0/bin/python3.8 or gtdbtk-2.3.2/bin/python3.8 gtdbtk de_novo_wf --genome_dir /public/home/lcy/arc/soil4210 --archaea --outgroup_taxon p__Altiarchaeota --out_dir /public/home/lcy/application/iphop_db/arc_MAGs_GTDB-tk_results1/ --cpus 5 --force --extension fa because I need the GTDB database version release 214. it also hangs on during "Identifying TIGRFAM protein families."

Looking forward for your reply ~

pchaumeil commented 4 days ago

Hello,

How are you running Tk ? with Docker? on an HPC using Slurm or PBS? How many genomes are you running at once? Does the pipeline works with one genome only?

Thanks, Pierre

Caiyulu-818 commented 4 days ago

Hi, I am running Tk with HPC using PBS. I input about 4000 genomes at once with CPU 64, and when I separated them into two pieces(2000 genomes) with 20 CPUs, it also failed, but when I input 2000 genomes with CPU 5, it succeeded.

I am just curious that I succeeded in inputting about 6W genomes using“gtdbtk classify”, but this time it failed just inputting 4000 genomes using "gtdbtk de_novo_wf ".

Best regards Caiyu

pchaumeil commented 3 days ago

I do not know the configuration of your HPC but it can be related to a wall time limit or a memory limit. the error and output file of the job you are running may give you more information about this.