Closed davidvilanova closed 7 years ago
Could be a memory leak while unloading the db from memory ( i have used usemem flag) and using an SGE queing system
Hi @davidvilanova !
It would help if you could post the exact command you used to run emapper.py! You can of course obliterate sensitive file names if required.
It looks like you asked for a custom db although you say you ran the optimized bacterial db. Has your argument been -d bact
for the database? If you ran -d bact_50
, I think this could have caused the error. It "slips through" line 76 (https://github.com/jhcepas/eggnog-mapper/blob/c436da5779b333531038eacfaa0a0d4255696544/emapper.py#L76) but is not recognized in line 226 (https://github.com/jhcepas/eggnog-mapper/blob/c436da5779b333531038eacfaa0a0d4255696544/emapper.py#L226).
What version of the software did you run?
I´m running emapper outside from its folder , the emapper.py file is in my PATH. I´m using the absolut path to the bact_50 optimized folder otherwise it cannot be found.
echo "emapper.py --database /home/david/work/sources/eggnog-mapper/data/hmmdb_levels/bact_50/bact_50.hmm --cpu 10 --usemem --output_dir test -o output_dir -i seq.faa " | qsub -pe parallel_smp 10 -l h_vmem=10G
I have re-run this way which looks much better (on the cluster with 10 CPU allocated). It looks that this relies in python multiprocesses (file pool.py log below in error file). I´m running python 2.7.12. Maybe i do not get how to run it properly. Since i´m using a linux linux custer i use the SGE system (-pe parallel_smp 10 -l h_vmem=10G per core).
emapper.py --d bact --cpu 10 --usemem --output_dir outputdir -o out -i seq.faa
# emapper-0.12.7-8-gc436da5
# ./emapper.py -d bact -i seq.faa --output_dir outputdir -o out --cpu 10 --usemem
Loading server at localhost, port 51500-51501
Loading server at localhost, port 51500-51501
Waiting for server to become ready... localhost 51500
Waiting for server to become ready... localhost 51500
Waiting for server to become ready... localhost 51500
Waiting for server to become ready... localhost 51500
Waiting for server to become ready... localhost 51500
Waiting for server to become ready... localhost 51500
Waiting for server to become ready... localhost 51500
Waiting for server to become ready... localhost 51500
Waiting for server to become ready... localhost 51500
Waiting for server to become ready... localhost 51500
Waiting for server to become ready... localhost 51500
Waiting for server to become ready... localhost 51500
Waiting for server to become ready... localhost 51500
Waiting for server to become ready... localhost 51500
Waiting for server to become ready... localhost 51500
Waiting for server to become ready... localhost 51500
Waiting for server to become ready... localhost 51500
Waiting for server to become ready... localhost 51500
Waiting for server to become ready... localhost 51500
Waiting for server to become ready... localhost 51500
Waiting for server to become ready... localhost 51500
Waiting for server to become ready... localhost 51500
Waiting for server to become ready... localhost 51500
Waiting for server to become ready... localhost 51500
Waiting for server to become ready... localhost 51500
Waiting for server to become ready... localhost 51500
Waiting for server to become ready... localhost 51500
Waiting for server to become ready... localhost 51500
Waiting for server to become ready... localhost 51500
Reading idmap /work/dvilanova/david/sources/eggnog-mapper/data/hmmdb_levels/bact_50/bact_50.hmm.idmap
159207 names loaded
Sequence mapping starts now!
Processed queries:30 total_time:9.13830184937 rate:3.28 q/s
Hit refinement starts now
And the in log error file i get
26 7.50618433952 3.46 q/s
Fatal exception (source file ../../easel/esl_threads.c, line 129):
thread creation failed
Fatal exception (source file ../../easel/esl_threads.c, line 129):
thread creation failed
Fatal exception (source file ../../easel/esl_threads.c, line 129):
thread creation failed
Fatal exception (source file ../../easel/esl_threads.c, line 129):
thread creation failed
Traceback (most recent call last):
File "./emapper.py", line 1080, in <module>
main(args)
File "./emapper.py", line 227, in main
refine_matches(args.input, seed_orthologs_file, hmm_hits_file, args)
File "./emapper.py", line 510, in refine_matches
base_tempdir=args.temp_dir)):
File "./emapper.py", line 572, in process_nog_hits_file
for r in pool.imap(search.refine_hit, cmds):
File "/work/dvilanova/miniconda2/lib/python2.7/multiprocessing/pool.py", line 668, in next
raise value
ValueError: Error running PHMMER
Fatal exception (source file ../../easel/esl_threads.c, line 129):
thread creation failed
Fatal exception (source file ../../easel/esl_threads.c, line 129):
thread creation failed
Fatal exception (source file ../../easel/esl_threads.c, line 129):
thread creation failed
Fatal exception (source file ../../easel/esl_threads.c, line 129):
thread creation failed
Fatal exception (source file ../../easel/esl_threads.c, line 129):
sounds like a deeper problem with multithreading python. Can you run basic multiprocessing script in your current setup?:
from multiprocessing import Pool, TimeoutError
import time
import os
def f(x):
return x*x
if __name__ == '__main__':
pool = Pool(processes=4) # start 4 worker processes
# print "[0, 1, 4,..., 81]"
print pool.map(f, range(10))
# print same numbers in arbitrary order
for i in pool.imap_unordered(f, range(10)):
print i
The multiprocessing works as expected with no error.
echo "python test2.py" | qsub -o out -e err -pe parallel_smp 4
==> out <==
[0, 1, 4, 9, 16, 25, 36, 49, 64, 81]
0
1
4
9
16
25
36
49
64
81
Hi guys, Same command with cpu=1 did almost work `emapper.py --d bact --cpu 1 --usemem --output_dir outputdir -o out -i seq.faa
` STDOUT Log
# ./emapper.py -d bact -i /home/dvilanova/work/WGS_PIPELINE/analysis_default_MOCK/seq.faa --override --output_dir outputdir -o out --cpu 1 --usemem
Loading server at localhost, port 51500-51501
Waiting for server to become ready... localhost 51500
Waiting for server to become ready... localhost 51500
Waiting for server to become ready... localhost 51500
Waiting for server to become ready... localhost 51500
Waiting for server to become ready... localhost 51500
Waiting for server to become ready... localhost 51500
Waiting for server to become ready... localhost 51500
Waiting for server to become ready... localhost 51500
Waiting for server to become ready... localhost 51500
Waiting for server to become ready... localhost 51500
Waiting for server to become ready... localhost 51500
Waiting for server to become ready... localhost 51500
Waiting for server to become ready... localhost 51500
Waiting for server to become ready... localhost 51500
Waiting for server to become ready... localhost 51500
Waiting for server to become ready... localhost 51500
Waiting for server to become ready... localhost 51500
Waiting for server to become ready... localhost 51500
Waiting for server to become ready... localhost 51500
Waiting for server to become ready... localhost 51500
Waiting for server to become ready... localhost 51500
Waiting for server to become ready... localhost 51500
Waiting for server to become ready... localhost 51500
Waiting for server to become ready... localhost 51500
Waiting for server to become ready... localhost 51500
Waiting for server to become ready... localhost 51500
Waiting for server to become ready... localhost 51500
Waiting for server to become ready... localhost 51500
Waiting for server to become ready... localhost 51500
Waiting for server to become ready... localhost 51500
Waiting for server to become ready... localhost 51500
Waiting for server to become ready... localhost 51500
Waiting for server to become ready... localhost 51500
Waiting for server to become ready... localhost 51500
Waiting for server to become ready... localhost 51500
Waiting for server to become ready... localhost 51500
Waiting for server to become ready... localhost 51500
Reading idmap /work/dvilanova/david/sources/eggnog-mapper/data/hmmdb_levels/bact_50/bact_50.hmm.idmap
159207 names loaded
Sequence mapping starts now!
Processed queries:30 total_time:82.1124022007 rate:0.37 q/s
Hit refinement starts now
Processed queries:26 total_time:217.179458857 rate:0.12 q/s
Reading HMM matches
Functional annotation of refined hits starts now
Processed queries:40 total_time:396.550137997 rate:0.10 q/s
Done
out.emapper.hmm_hits
out.emapper.seed_orthologs
out.emapper.annotations
Total time: 734.632 secs
================================================================================
CITATION:
If you use this software, please cite:
[1] Fast genome-wide functional annotation through orthology assignment by
eggNOG-mapper. Jaime Huerta-Cepas, Damian Szklarczyk, Lars Juhl Jensen,
Christian von Mering and Peer Bork. Submitted (2016).
[2] eggNOG 4.5: a hierarchical orthology framework with improved functional
annotations for eukaryotic, prokaryotic and viral sequences. Jaime
Huerta-Cepas, Damian Szklarczyk, Kristoffer Forslund, Helen Cook, Davide
Heller, Mathias C. Walter, Thomas Rattei, Daniel R. Mende, Shinichi
Sunagawa, Michael Kuhn, Lars Juhl Jensen, Christian von Mering, and Peer
Bork. Nucl. Acids Res. (04 January 2016) 44 (D1): D286-D293. doi:
10.1093/nar/gkv1248
[3] Accelerated Profile HMM Searches. PLoS Comput. Biol. 7:e1002195. Eddy SR.
2011.
(e.g. Functional annotation was performed using emapper-0.12.7-8-gc436da5 [1]
based on eggNOG orthology data [2]. Sequence searches were performed
using [3].)
================================================================================
STDERR (no real usefull information)
26 64.2112979889 0.40 q/s
26 216.679230928 0.12 q/s (refinement)
Your job has been killed (cluster message)
....
It seems that your cluster queue system killed the process, probably because it ran out of memory. Could you try if the same command runs well skipping the queue system?
I can´t do it this way since i´m launching the jobs from a frontend with restricted usage. I have been using it for three years with different programs , cpus and memory settings. I have also adjusted memory setting trying with 2 cpu and 100G per core which should be enough for loading the complete bact database but it also failed.
Have you tried with the SGE queing system ?
I have run the analysis with the "-m diamond" tag and i did work perfectly with the cluster. I suspect the problem is related to threads when going through the hmm default pipeline.
@davidvilanova It seems to work with the SGE setup in our cluster... I used the following submission command:
qsub -pe smp 10 test_sge.sh
and the following job script:
$ cat test_sge.sh
eggnog-mapper/emapper.py -i eggnog-mapper/test/testCOG0515.fa -o test_polb -d bactNOG --cpu 10 --override
output
$ cat test_sge.sh.o
# emapper-0.12.7-8-gc436da5
# ./emapper.py -i eggnog-mapper/test/testCOG0515.fa -o test_polb -d bactNOG --cpu 10 --override
Sequence mapping starts now!
Processed queries:5 total_time:165.61026597 rate:0.03 q/s
Hit refinement starts now
Processed queries:5 total_time:21.5015897751 rate:0.23 q/s
Reading HMM matches
Functional annotation of refined hits starts now
Processed queries:14 total_time:0.268085002899 rate:52.22 q/s
Done
test_polb.emapper.hmm_hits
test_polb.emapper.seed_orthologs
test_polb.emapper.annotations
Total time: 187.896 secs
In any case, I would say the diamond mode is overall preferred. All benchmarks are showing same or even better results than using HMM. At least for genomes that are not extremely far from the species covered in eggnog4.5.
Thanks for replicating. I´m using the optimized bacterial database -d bact although. I will stay with diamond Thanks, david
Hi while running in default mode a set of proteins against optimzed bacteria (hmm) i get an error ... Reading idmap /home/david/work/sources/eggnog-mapper/data/hmmdb_levels/bact_50/bact_50.hmm.idmap 159207 names loaded Sequence mapping starts now! Processed queries:1927 total_time:942.721111059 rate:2.04 q/s refined hits not available for custom hmm databases. Reading HMM matches Functional annotation of refined hits starts now error
It seems it is trying to refine hits however those are not available for custom databases ?? The database i have used is the optimized bacteria dowloaded with the download script. The annotations file does not display any annotation ?? ...