apcamargo / genomad

geNomad: Identification of mobile genetic elements
https://portal.nersc.gov/genomad/
Other
169 stars 17 forks source link

genomad terminates during mmseqs #2

Closed jzrapp closed 1 year ago

jzrapp commented 1 year ago

Hi,

I'm trying to run genomad for the first time! I'm using it on a compute cluster but with shared resources, so trying to control memory and threads. I've tried several times, each time adjusting cores and memory resources, and also using the split option you indicated in the manual. But I always end up here:

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/home/jorap2/.conda/envs/genomad/bin/genomad", line 10, in <module>
    sys.exit(cli())
  File "/home/jorap2/.conda/envs/genomad/lib/python3.10/site-packages/click/core.py", line 1130, in __call__
    return self.main(*args, **kwargs)
  File "/home/jorap2/.conda/envs/genomad/lib/python3.10/site-packages/rich_click/rich_group.py", line 21, in main
    rv = super().main(*args, standalone_mode=False, **kwargs)
  File "/home/jorap2/.conda/envs/genomad/lib/python3.10/site-packages/click/core.py", line 1055, in main
    rv = self.invoke(ctx)
  File "/home/jorap2/.conda/envs/genomad/lib/python3.10/site-packages/click/core.py", line 1657, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/home/jorap2/.conda/envs/genomad/lib/python3.10/site-packages/click/core.py", line 1404, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/home/jorap2/.conda/envs/genomad/lib/python3.10/site-packages/click/core.py", line 760, in invoke
    return __callback(*args, **kwargs)
  File "/home/jorap2/.conda/envs/genomad/lib/python3.10/site-packages/click/decorators.py", line 26, in new_func
    return f(get_current_context(), *args, **kwargs)
  File "/home/jorap2/.conda/envs/genomad/lib/python3.10/site-packages/genomad/cli.py", line 1015, in end_to_end
    ctx.invoke(
  File "/home/jorap2/.conda/envs/genomad/lib/python3.10/site-packages/click/core.py", line 760, in invoke
    return __callback(*args, **kwargs)
  File "/home/jorap2/.conda/envs/genomad/lib/python3.10/site-packages/genomad/cli.py", line 338, in annotate
    genomad.annotate.main(
  File "/home/jorap2/.conda/envs/genomad/lib/python3.10/site-packages/genomad/modules/annotate.py", line 201, in main
    mmseqs2_obj.run_mmseqs2(threads, sensitivity, evalue, splits)
  File "/home/jorap2/.conda/envs/genomad/lib/python3.10/site-packages/genomad/mmseqs2.py", line 134, in run_mmseqs2
    raise Exception(f"'{command_str}' failed.") from e
Exception: 'mmseqs search all-samples_VIRUSES_out/all-samples_5kb_1.5kb-cir_annotate/all-samples_5kb_1.5kb-cir_mmseqs2/query_db/query_db genomad_db/genomad_db_v1.1/genomad_db all-samples_VIRUSES_out/all-samples_5kb_1.5kb-cir_annotate/all-samples_5kb_1.5kb-cir_mmseqs2/search_db/search_db all-samples_VIRUSES_out/all-samples_5kb_1.5kb-cir_annotate/all-samples_5kb_1.5kb-cir_mmseqs2/tmp --threads 128 -s 6.4 --cov-mode 1 -c 0.2 -e 0.001 --split 16 --split-mode 0' failed.
slurmstepd: error: Detected 1 oom-kill event(s) in StepId=15098211.batch cgroup. Some of your processes may have been killed by the cgroup out-of-memory handler.

It seems like mmseqs is using 128 threads and I don't know how to contain it. Do you think this is the issue?

Thanks!

apcamargo commented 1 year ago

It seems you are running out of memory. Try to set the --splits parameter to something line 8, or even 16. This parameter splits the search step into multiple parts. This will reduce speed a bit but will prevent memory issues.

I'll probably change the default value in the next version to prevent cases like this.

Let me know it it works!

jzrapp commented 1 year ago

Hi, I actually already did this after receiving the error the first time.

I run the command like this genomad end-to-end --splits 16 all-samples.fasta all-samples_VIRUSES_out genomad_db/genomad_db_v1.1/

and I allocated 250 GB memory..

apcamargo commented 1 year ago

What version of MMseqs2 are you sing? The 14-7e284 release that came out a few days ago doesn't work with geNomad yet. I just fixed the Conda recipe to fix that

jzrapp commented 1 year ago

wow, thanks! I installed it today, so, yes, I was using 14.7e284. I will change that and see what happens! Thanks again!

apcamargo commented 1 year ago

No problem! I'll add support for 14.7e284 in the next geNomad release, but it might take some time.