I ran into a problem after running the program for two days. At the beginning of the program, there is no problem with its output, such as:
[2021-08-04T04:31:02+0800] [MainThread] [I] [toil.leader] 10 jobs are running, 0 jobs are issued and waiting to run
[2021-08-04T05:31:04+0800] [MainThread] [I] [toil.leader] 10 jobs are running, 0 jobs are issued and waiting to run
[2021-08-04T06:31:05+0800] [MainThread] [I] [toil.leader] 10 jobs are running, 0 jobs are issued and waiting to run
[2021-08-04T07:31:07+0800] [MainThread] [I] [toil.leader] 9 jobs are running, 0 jobs are issued and waiting to run
[2021-08-04T08:31:08+0800] [MainThread] [I] [toil.leader] 9 jobs are running, 0 jobs are issued and waiting to run
[2021-08-04T09:31:09+0800] [MainThread] [I] [toil.leader] 9 jobs are running, 0 jobs are issued and waiting to run
[2021-08-04T10:31:10+0800] [MainThread] [I] [toil.leader] 9 jobs are running, 0 jobs are issued and waiting to run
[2021-08-04T11:31:11+0800] [MainThread] [I] [toil.leader] 8 jobs are running, 0 jobs are issued and waiting to run
[2021-08-04T12:31:11+0800] [MainThread] [I] [toil.leader] 8 jobs are running, 0 jobs are issued and waiting to run
[2021-08-04T13:31:12+0800] [MainThread] [I] [toil.leader] 8 jobs are running, 0 jobs are issued and waiting to run
[2021-08-04T14:31:12+0800] [MainThread] [I] [toil.leader] 8 jobs are running, 0 jobs are issued and waiting to run
[2021-08-04T15:31:13+0800] [MainThread] [I] [toil.leader] 8 jobs are running, 0 jobs are issued and waiting to run
[2021-08-04T16:31:14+0800] [MainThread] [I] [toil.leader] 8 jobs are running, 0 jobs are issued and waiting to run
[2021-08-04T17:31:15+0800] [MainThread] [I] [toil.leader] 8 jobs are running, 0 jobs are issued and waiting to run
[2021-08-04T18:31:15+0800] [MainThread] [I] [toil.leader] 8 jobs are running, 0 jobs are issued and waiting to run
[2021-08-04T19:31:16+0800] [MainThread] [I] [toil.leader] 8 jobs are running, 0 jobs are issued and waiting to run
[2021-08-04T20:31:18+0800] [MainThread] [I] [toil.leader] 8 jobs are running, 0 jobs are issued and waiting to run
[2021-08-04T21:31:19+0800] [MainThread] [I] [toil.leader] 8 jobs are running, 0 jobs are issued and waiting to run
[2021-08-04T22:31:19+0800] [MainThread] [I] [toil.leader] 8 jobs are running, 0 jobs are issued and waiting to run
[2021-08-04T23:31:20+0800] [MainThread] [I] [toil.leader] 8 jobs are running, 0 jobs are issued and waiting to run
[2021-08-05T00:31:21+0800] [MainThread] [I] [toil.leader] 8 jobs are running, 0 jobs are issued and waiting to run
However, after running the program for two days, I encountered some errors, like this:
qstat: Pbs Server is currently too busy to service this request. Please retry this request. 601294.mu01
[2021-08-05T00:43:11+0800] [Thread-2 ] [E] [toil.batchSystems.abstractGridEngineBatchSystem] Will retry errored operation getJobExitCode, code 30: qstat: Pbs Server is currently too busy to service this request. Please retry this request. 601294.mu01
qstat: Pbs Server is currently too busy to service this request. Please retry this request. 601294.mu01
[2021-08-05T00:44:29+0800] [Thread-2 ] [E] [toil.batchSystems.abstractGridEngineBatchSystem] Will retry errored operation getJobExitCode, code 30: qstat: Pbs Server is currently too busy to service this request. Please retry this request. 601294.mu01
qstat: Pbs Server is currently too busy to service this request. Please retry this request. 601296.mu01
[2021-08-05T00:44:30+0800] [Thread-2 ] [E] [toil.batchSystems.abstractGridEngineBatchSystem] Will retry errored operation getJobExitCode, code 30: qstat: Pbs Server is currently too busy to service this request. Please retry this request. 601296.mu01
qstat: Pbs Server is currently too busy to service this request. Please retry this request. 601296.mu01
[2021-08-05T00:44:31+0800] [Thread-2 ] [E] [toil.batchSystems.abstractGridEngineBatchSystem] Will retry errored operation getJobExitCode, code 30: qstat: Pbs Server is currently too busy to service this request. Please retry this request. 601296.mu01
qstat: Pbs Server is currently too busy to service this request. Please retry this request. 601296.mu01
[2021-08-05T00:44:33+0800] [Thread-2 ] [E] [toil.batchSystems.abstractGridEngineBatchSystem] Failed operation getJobExitCode, code 30: qstat: Pbs Server is currently too busy to service this request. Please retry this request. 601296.mu01
[2021-08-05T00:44:33+0800] [Thread-2 ] [E] [toil.batchSystems.abstractGridEngineBatchSystem] GridEngine like batch system failure
Traceback (most recent call last):
File "/gpfs/home/liunyw/dragon_cactus/soft/cactus-bin-v2.0.3/venv/lib/python3.6/site-packages/toil/batchSystems/abstractGridEngineBatchSystem.py", line 222, in run
while self._runStep():
File "/gpfs/home/liunyw/dragon_cactus/soft/cactus-bin-v2.0.3/venv/lib/python3.6/site-packages/toil/batchSystems/abstractGridEngineBatchSystem.py", line 212, in _runStep
activity |= self.checkOnJobs()
File "/gpfs/home/liunyw/dragon_cactus/soft/cactus-bin-v2.0.3/venv/lib/python3.6/site-packages/toil/batchSystems/abstractGridEngineBatchSystem.py", line 187, in checkOnJobs
status = self.boss.with_retries(self.getJobExitCode, batchJobID)
File "/gpfs/home/liunyw/dragon_cactus/soft/cactus-bin-v2.0.3/venv/lib/python3.6/site-packages/toil/batchSystems/abstractGridEngineBatchSystem.py", line 435, in with_retries
raise err
File "/gpfs/home/liunyw/dragon_cactus/soft/cactus-bin-v2.0.3/venv/lib/python3.6/site-packages/toil/batchSystems/abstractGridEngineBatchSystem.py", line 426, in with_retries
return operation(*args, **kwargs)
File "/gpfs/home/liunyw/dragon_cactus/soft/cactus-bin-v2.0.3/venv/lib/python3.6/site-packages/toil/batchSystems/torque.py", line 130, in getJobExitCode
stdout = call_command(args)
File "/gpfs/home/liunyw/dragon_cactus/soft/cactus-bin-v2.0.3/venv/lib/python3.6/site-packages/toil/lib/misc.py", line 67, in call_command
raise CalledProcessErrorStderr(proc.returncode, cmd, output=stdout, stderr=stderr)
toil.lib.misc.CalledProcessErrorStderr: Command '['qstat', '-f', '601296']' exit status 30: qstat: Pbs Server is currently too busy to service this request. Please retry this request. 601296.mu01
Hello,
I ran into a problem after running the program for two days. At the beginning of the program, there is no problem with its output, such as:
However, after running the program for two days, I encountered some errors, like this:
The command used is
Could you give me some advice? Thanks
Best, Yawen