geronimp / enrichM

Toolbox for comparative genomics of MAGs
80 stars 22 forks source link

super long run time for 2 genomes #81

Open ganiatgithub opened 5 years ago

ganiatgithub commented 5 years ago

Hello,

I have 2 genomes in fna, this is my script:

source activate enrichm_0.5.0 export ENRICHM_DB=/path_to_enrichm_db

enrichm annotate \ --output /path_to_output \ --genome_directory /path_to_genomes \ --ko_hmm \ --ec \ --pfam \ --orthologs \ --threads 8 \ --log /path_to_out/LOG

conda deactivate

I submitted it to a server, requested 8 cores and 250 GB RAM. It was killed after 108 hours, because not enough wall-time: PBS: job killed: walltime 388897 exceeded limit 388800 (unit is minutes)

Can the program simply pick it up from where it left if I re-run the job?

The genomes are 2M in size, does this run time seem normal to you?

Many thanks!

geronimp commented 5 years ago

Hi there,

Thanks again for the bug report. This run time isnt normal. I've also had other users experiencing this - I'll be looking into it soon so I'll keep you posted (busy time for me at the moment, sorry for the delays)

Thanks, Joel

ganiatgithub commented 5 years ago

Hi again,

I did another try with only one MAG, 1.8 M in size. I requested 12 cores and 250 GB RAM from a server, it was again killed after 108 hours. Not much info from the log file:

[2019-06-20 09:29:43 AM] INFO: Command: /path_to_env/Miniconda3/envs/enrichm_0.5.0/bin/enrichm annotate --output /path_to_file/08annotate_enrichm_AOA/out --genome_files /path_to_file/08annotate_enrichm_AOA/bin4.fna --ko_hmm --ec --pfam --orthologs --threads 12 --log /path_to_file/08annotate_enrichm_AOA/LOG [2019-06-20 09:29:43 AM] INFO: Running the annotate pipeline [2019-06-20 09:29:43 AM] INFO: Running pipeline: annotate [2019-06-20 09:29:43 AM] INFO: Setting up for genome annotation [2019-06-20 09:29:43 AM] INFO: Calling proteins for annotation [2019-06-20 09:29:43 AM] INFO: Preparing genomes for annotation [2019-06-20 09:29:43 AM] INFO: - Calling proteins for 1 genomes [2019-06-20 09:30:34 AM] INFO: Starting annotation: [2019-06-20 09:30:34 AM] INFO: - Annotating genomes with hypothetical clusters [2019-06-20 09:30:34 AM] INFO: - Generating MMSeqs2 database [2019-06-20 09:30:34 AM] INFO: - Clustering genome proteins

Please let me know how to fix it.

Cheers

ganiatgithub commented 5 years ago

Hi,

Just wondering if there's any update on this?

Cheers

WardDeb commented 5 years ago

Hi,

Sorry for hijacking the thread, but I am currently encountering something similar. From what I have seen this problem is caused by the mmseqs2 database not being generated properly. I managed to get through this step by hardcoding input faa en output db manually (line 398 in annotate.py in the libs), but ran into more problems downstream (with the genome_dicts).

Alternatively you could also skip this step and drop --orthologs from your input.

Cheers