Closed GoogleCodeExporter closed 9 years ago
Hi Mikael,
Thanks for reporting this problem. Your setting for h_rt to double
worker-lifetime is the correct approach. The workers are supposed to die
gracefully.
What is the size (number of sequences) of your input proteins and what is the
size of your largest sequence?
We know that for a large sequence it takes longer to calculate the matches but
the worker will only die after it has completed the calculations. We will in
the future revise our lookahead feature and take into consideration other
factors.
Cluster mode is new to InterproScan 5 so we welcome any feedback.
Best regards,
Gift
Original comment by nuka....@gmail.com
on 8 May 2013 at 9:29
Hi,
ok, then my assumption was right. This night I tried running the same h_rt to
the workers as to the master. The end result was that some of the workers lived
as long as the master or roughly 35000s, most lived more than 20000s.
jvm.maximum.life.seconds was set to 3600s. So it appears as the limit is
ignored. I got the complete protein set annotated when the workers were left to
live as long as the master.
The protein set I tried on has about fungal 8700 sequences, not particularly
long. The length distribution looks like this:
106 0
590 100
830 200
996 300
816 400
828 500
488 600
379 700
311 800
224 900
194 1000
144 1100
118 1200
71 1300
80 1400
45 1500
52 1600
36 1700
21 1800
18 1900
16 2000
20 2100
10 2200
10 2300
6 2400
5 2500
4 2600
I do get some warnings in the run log and the warnings seem to be related to
workers who go away, as the total number of workers decrease for every warning
log line. So the worker pool seems not to be replenished when they go away. Is
there any way to control the number of workers? What limits the number?
07/05/2013 16:59:56 Welcome to InterProScan 5RC6
Running the following analyses:
[jobTIGRFAM-13.0, jobPIRSF-2.83, jobProDom-2006.1, jobSMART-6.2,
jobPrositeProfiles-20.89, jobHAMAP-201302.26, jobPfamA-26.0,
jobPrositePatterns-20.89, jobPRINTS-42.0, jobSuperFamily-1.75, jobCoils-2.2,
jobGene3d-3.5.0]
The project/Cluster Run ID for this run is: hirr
Running InterProScan v5 in CLUSTER mode...
07/05/2013 17:00:15 first transaction ...
Available matches will be retrieved from the pre-calculated match lookup
service.
Matches for any sequences that are not represented in the lookup service will
be calculated locally.
2013-05-07 17:01:14,794 [org.apache.activemq.broker.TransportConnection:203]
WARN - Transport Connection to: tcp://10.15.0.3:37095 failed:
java.io.EOFException
2013-05-07 17:01:15,053 [org.apache.activemq.broker.TransportConnection:203]
WARN - Transport Connection to: tcp://10.15.0.3:37097 failed:
java.io.EOFException
2013-05-07 17:01:15,297 [org.apache.activemq.broker.TransportConnection:203]
WARN - Transport Connection to: tcp://10.15.0.2:38054 failed:
java.io.EOFException
2013-05-07 17:01:15,530 [org.apache.activemq.broker.TransportConnection:203]
WARN - Transport Connection to: tcp://10.15.0.2:38056 failed:
java.io.EOFException
2013-05-07 17:14:53,235 [org.apache.activemq.broker.TransportConnection:203]
WARN - Transport Connection to: tcp://10.17.0.11:53283 failed:
java.io.EOFException
2013-05-07 17:45:57,752 [org.apache.activemq.broker.TransportConnection:203]
WARN - Transport Connection to: tcp://10.15.0.2:38073 failed:
java.io.EOFException
2013-05-07 17:47:07,233 [org.apache.activemq.broker.TransportConnection:203]
WARN - Transport Connection to: tcp://10.15.0.2:38075 failed:
java.io.EOFException
2013-05-07 17:48:16,550 [org.apache.activemq.broker.TransportConnection:203]
WARN - Transport Connection to: tcp://10.15.0.2:38077 failed:
java.io.EOFException
2013-05-07 17:49:26,602 [org.apache.activemq.broker.TransportConnection:203]
WARN - Transport Connection to: tcp://10.15.0.2:38079 failed:
java.io.EOFException
2013-05-07 17:50:35,787 [org.apache.activemq.broker.TransportConnection:203]
WARN - Transport Connection to: tcp://10.15.0.2:38081 failed:
java.io.EOFException
2013-05-07 17:51:43,507 [org.apache.activemq.broker.TransportConnection:203]
WARN - Transport Connection to: tcp://10.15.0.2:38083 failed:
java.io.EOFException
2013-05-07 17:52:51,671 [org.apache.activemq.broker.TransportConnection:203]
WARN - Transport Connection to: tcp://10.15.0.2:38085 failed:
java.io.EOFException
2013-05-07 17:53:59,245 [org.apache.activemq.broker.TransportConnection:203]
WARN - Transport Connection to: tcp://10.15.0.2:38087 failed:
java.io.EOFException
2013-05-07 17:55:07,573 [org.apache.activemq.broker.TransportConnection:203]
WARN - Transport Connection to: tcp://10.15.0.2:38089 failed:
java.io.EOFException
2013-05-07 17:56:15,494 [org.apache.activemq.broker.TransportConnection:203]
WARN - Transport Connection to: tcp://10.15.0.2:38091 failed:
java.io.EOFException
2013-05-07 17:57:22,830 [org.apache.activemq.broker.TransportConnection:203]
WARN - Transport Connection to: tcp://10.15.0.2:38093 failed:
java.io.EOFException
2013-05-07 17:58:32,276 [org.apache.activemq.broker.TransportConnection:203]
WARN - Transport Connection to: tcp://10.15.0.2:38095 failed:
java.io.EOFException
2013-05-07 17:59:42,402 [org.apache.activemq.broker.TransportConnection:203]
WARN - Transport Connection to: tcp://10.15.0.2:38097 failed:
java.io.EOFException
2013-05-07 18:00:49,675 [org.apache.activemq.broker.TransportConnection:203]
WARN - Transport Connection to: tcp://10.15.0.2:38099 failed:
java.io.EOFException
2013-05-07 18:01:59,488 [org.apache.activemq.broker.TransportConnection:203]
WARN - Transport Connection to: tcp://10.15.0.2:38101 failed:
java.io.EOFException
2013-05-07 18:03:11,309 [org.apache.activemq.broker.TransportConnection:203]
WARN - Transport Connection to: tcp://10.15.0.2:38103 failed:
java.io.EOFException
2013-05-07 18:04:23,020 [org.apache.activemq.broker.TransportConnection:203]
WARN - Transport Connection to: tcp://10.15.0.2:38105 failed:
java.io.EOFException
2013-05-07 18:05:34,256 [org.apache.activemq.broker.TransportConnection:203]
WARN - Transport Connection to: tcp://10.15.0.2:38107 failed:
java.io.EOFException
07/05/2013 18:07:09 25% completed
2013-05-07 20:05:44,680 [org.apache.activemq.broker.TransportConnection:203]
WARN - Transport Connection to: tcp://10.15.0.3:37113 failed:
java.io.EOFException
2013-05-07 20:40:36,243 [org.apache.activemq.broker.TransportConnection:203]
WARN - Transport Connection to: tcp://10.15.0.3:37099 failed:
java.io.EOFException
07/05/2013 20:51:30 50% completed
2013-05-07 21:30:16,426 [org.apache.activemq.broker.TransportConnection:203]
WARN - Transport Connection to: tcp://10.15.0.3:37101 failed:
java.io.EOFException
2013-05-07 21:30:31,969 [org.apache.activemq.broker.TransportConnection:203]
WARN - Transport Connection to: tcp://10.15.0.5:52380 failed:
java.io.EOFException
2013-05-07 21:44:33,081 [org.apache.activemq.broker.TransportConnection:203]
WARN - Transport Connection to: tcp://10.15.0.3:37109 failed:
java.io.EOFException
2013-05-07 22:39:36,869 [org.apache.activemq.broker.TransportConnection:203]
WARN - Transport Connection to: tcp://10.15.0.3:37105 failed:
java.io.EOFException
2013-05-07 22:43:56,329 [org.apache.activemq.broker.TransportConnection:203]
WARN - Transport Connection to: tcp://10.15.0.3:37103 failed:
java.io.EOFException
2013-05-07 22:59:39,002 [org.apache.activemq.broker.TransportConnection:203]
WARN - Transport Connection to: tcp://10.15.0.3:37111 failed:
java.io.EOFException
2013-05-07 23:08:29,252 [org.apache.activemq.broker.TransportConnection:203]
WARN - Transport Connection to: tcp://10.15.0.2:38064 failed:
java.io.EOFException
2013-05-07 23:14:45,177 [org.apache.activemq.broker.TransportConnection:203]
WARN - Transport Connection to: tcp://10.15.0.4:42088 failed:
java.io.EOFException
2013-05-07 23:19:32,394 [org.apache.activemq.broker.TransportConnection:203]
WARN - Transport Connection to: tcp://10.15.0.3:37107 failed:
java.io.EOFException
2013-05-07 23:23:38,138 [org.apache.activemq.broker.TransportConnection:203]
WARN - Transport Connection to: tcp://10.15.0.2:38058 failed:
java.io.EOFException
07/05/2013 23:27:54 75% completed
2013-05-07 23:39:04,313 [org.apache.activemq.broker.TransportConnection:203]
WARN - Transport Connection to: tcp://10.15.0.2:38066 failed:
java.io.EOFException
2013-05-07 23:42:15,216 [org.apache.activemq.broker.TransportConnection:203]
WARN - Transport Connection to: tcp://10.15.0.2:38062 failed:
java.io.EOFException
2013-05-08 00:01:42,335 [org.apache.activemq.broker.TransportConnection:203]
WARN - Transport Connection to: tcp://10.15.0.2:38060 failed:
java.io.EOFException
2013-05-08 00:03:16,904 [org.apache.activemq.broker.TransportConnection:203]
WARN - Transport Connection to: tcp://10.17.0.11:53278 failed:
java.io.EOFException
2013-05-08 00:13:17,469 [org.apache.activemq.broker.TransportConnection:203]
WARN - Transport Connection to: tcp://10.15.0.2:38068 failed:
java.io.EOFException
2013-05-08 00:22:18,561 [org.apache.activemq.broker.TransportConnection:203]
WARN - Transport Connection to: tcp://10.15.0.4:42078 failed:
java.io.EOFException
2013-05-08 00:32:19,803 [org.apache.activemq.broker.TransportConnection:203]
WARN - Transport Connection to: tcp://10.17.0.11:53281 failed:
java.io.EOFException
2013-05-08 00:58:38,590 [org.apache.activemq.broker.TransportConnection:203]
WARN - Transport Connection to: tcp://10.15.0.4:42101 failed:
java.io.EOFException
08/05/2013 01:12:49 90% completed
2013-05-08 01:14:03,130 [org.apache.activemq.broker.TransportConnection:203]
WARN - Transport Connection to: tcp://10.15.0.2:38070 failed:
java.io.EOFException
2013-05-08 01:57:53,970 [org.apache.activemq.broker.TransportConnection:203]
WARN - Transport Connection to: tcp://10.15.0.4:42107 failed:
java.io.EOFException
2013-05-08 02:04:18,705 [org.apache.activemq.broker.TransportConnection:203]
WARN - Transport Connection to: tcp://10.15.0.4:42094 failed:
java.io.EOFException
2013-05-08 02:08:04,324 [org.apache.activemq.broker.TransportConnection:203]
WARN - Transport Connection to: tcp://10.15.0.4:42083 failed:
java.io.EOFException
2013-05-08 02:08:19,136 [org.apache.activemq.broker.TransportConnection:203]
WARN - Transport Connection to: tcp://10.15.0.4:42114 failed:
java.io.EOFException
2013-05-08 02:17:58,926 [org.apache.activemq.broker.TransportConnection:203]
WARN - Transport Connection to: tcp://10.17.0.11:53276 failed:
java.io.EOFException
2013-05-08 02:56:21,347
[uk.ac.ebi.interpro.scan.management.model.implementations.WriteOutputStep:245]
WARN - At run completion, unable to delete temporary directory
/nfs4/my-gridstore1/proj1/mykopat-gbrowse/software/ipr5/5rc6/temp/my-mgrid4_2013
0507_170014592_roh/jobPIRSF-2.83
2013-05-08 02:56:21,398
[uk.ac.ebi.interpro.scan.management.model.implementations.WriteOutputStep:250]
WARN - At run completion, unable to delete temporary directory
/nfs4/my-gridstore1/proj1/mykopat-gbrowse/software/ipr5/5rc6/temp/my-mgrid4_2013
0507_170014592_roh
08/05/2013 02:56:44 100% of analyses done: InterProScan analyses completed
Best regards,
Mikael
Original comment by mikael.d...@gmail.com
on 8 May 2013 at 11:52
I am also getting the same issue with the runtime.
We also changed the properties file and set grid.jobs.limit=12 and found that
interpro won't adhere to the limit. It would get 12 workers running, but then
it continues to spawn more workers in the queue.
Looking at the source code, I don't see any queue checks for SGE. So, I'm not
sure limiting the workers will help.
Best Regards,
Michael
Original comment by mike8...@gmail.com
on 23 Jul 2013 at 8:14
Original comment by Maxim.Sc...@gmail.com
on 15 Aug 2013 at 11:11
Should be fixed from the first official release on
(https://code.google.com/p/interproscan/wiki/Interproscan5_44_ReleaseNotes).
Original comment by Maxim.Sc...@gmail.com
on 5 Nov 2013 at 12:21
Original issue reported on code.google.com by
mikael.d...@gmail.com
on 4 May 2013 at 6:54