ComparativeGenomicsToolkit / cactus

Official home of genome aligner based upon notion of Cactus graphs
Other
524 stars 111 forks source link

OSError: [Errno 24] Too many open files #763

Open xiaolongtu opened 2 years ago

xiaolongtu commented 2 years ago

hi, I had run this program for several months, now met a error, how to solve it ?

<========= Issued job 'JobFunctionWrappingJob' kind-JobFunctionWrappingJob/instance-in0k5v1z with job batch system ID: 1011 and cores: 28, disk: 465.7 Gi, and memory: 186.3 Gi The job seems to have left a log file, indicating failure: 'JobFunctionWrappingJob' kind-JobFunctionWrappingJob/instance-06upo4l3 Log from job "kind-JobFunctionWrappingJob/instance-06upo4l3" follows: =========> [2022-08-10T07:42:38+0800] [MainThread] [I] [toil.worker] ---TOIL WORKER OUTPUT LOG--- [2022-08-10T07:42:38+0800] [MainThread] [I] [toil] Running Toil version 5.4.0-87293d63fa6c76f03bed3adf93414ffee67bf9a7 on host cu11. [2022-08-10T07:42:38+0800] [MainThread] [I] [toil.worker] Working on job 'JobFunctionWrappingJob' kind-JobFunctionWrappingJob/instance-06upo4l3 Traceback (most recent call last): File "/gpfs/home/tuxl/software/genome/cactus-bin-v2.0.4/venv/lib/python3.6/site-packages/toil/worker.py", line 367, in workerScript job = Job.loadJob(jobStore, jobDesc) File "/gpfs/home/tuxl/software/genome/cactus-bin-v2.0.4/venv/lib/python3.6/site-packages/toil/job.py", line 2238, in loadJob jobStore.readFile(pickleFile, filename) File "/gpfs/home/tuxl/software/genome/cactus-bin-v2.0.4/venv/lib/python3.6/site-packages/toil/jobStores/fileJobStore.py", line 440, in readFile self._checkJobStoreFileID(jobStoreFileID) File "/gpfs/home/tuxl/software/genome/cactus-bin-v2.0.4/venv/lib/python3.6/site-packages/toil/jobStores/fileJobStore.py", line 724, in _checkJobStoreFileID raise NoSuchFileException(jobStoreFileID) toil.jobStores.abstractJobStore.NoSuchFileException: File 'files/for-job/kind-JobFunctionWrappingJob/instance-_mjhysbw/cleanup/file-313144d484dc41d5a56bf66e46ee6955/stream' does not exist. [2022-08-10T07:42:42+0800] [MainThread] [E] [toil.worker] Exiting the worker because of a failed job on host cu11 <========= Issued job 'JobFunctionWrappingJob' kind-JobFunctionWrappingJob/instance-06upo4l3 with job batch system ID: 1012 and cores: 28, disk: 465.7 Gi, and memory: 186.3 Gi The job seems to have left a log file, indicating failure: 'JobFunctionWrappingJob' kind-JobFunctionWrappingJob/instance-o6yxat3s Log from job "kind-JobFunctionWrappingJob/instance-o6yxat3s" follows: =========> [2022-08-10T07:43:24+0800] [MainThread] [I] [toil.worker] ---TOIL WORKER OUTPUT LOG--- [2022-08-10T07:43:24+0800] [MainThread] [I] [toil] Running Toil version 5.4.0-87293d63fa6c76f03bed3adf93414ffee67bf9a7 on host cu06. [2022-08-10T07:43:24+0800] [MainThread] [I] [toil.worker] Working on job 'JobFunctionWrappingJob' kind-JobFunctionWrappingJob/instance-o6yxat3s Traceback (most recent call last): File "/gpfs/home/tuxl/software/genome/cactus-bin-v2.0.4/venv/lib/python3.6/site-packages/toil/worker.py", line 367, in workerScript job = Job.loadJob(jobStore, jobDesc) File "/gpfs/home/tuxl/software/genome/cactus-bin-v2.0.4/venv/lib/python3.6/site-packages/toil/job.py", line 2238, in loadJob jobStore.readFile(pickleFile, filename) File "/gpfs/home/tuxl/software/genome/cactus-bin-v2.0.4/venv/lib/python3.6/site-packages/toil/jobStores/fileJobStore.py", line 440, in readFile self._checkJobStoreFileID(jobStoreFileID) File "/gpfs/home/tuxl/software/genome/cactus-bin-v2.0.4/venv/lib/python3.6/site-packages/toil/jobStores/fileJobStore.py", line 724, in _checkJobStoreFileID raise NoSuchFileException(jobStoreFileID) toil.jobStores.abstractJobStore.NoSuchFileException: File 'files/for-job/kind-JobFunctionWrappingJob/instance-l37_oz0h/cleanup/file-a937713751704f0fa716d03c1e11b55a/stream' does not exist. [2022-08-10T07:43:28+0800] [MainThread] [E] [toil.worker] Exiting the worker because of a failed job on host cu06 <========= Issued job 'JobFunctionWrappingJob' kind-JobFunctionWrappingJob/instance-o6yxat3s with job batch system ID: 1013 and cores: 28, disk: 465.7 Gi, and memory: 186.3 Gi GridEngine like batch system failure Traceback (most recent call last): File "/gpfs/home/tuxl/software/genome/cactus-bin-v2.0.4/venv/lib/python3.6/site-packages/toil/batchSystems/abstractGridEngineBatchSystem.py", line 222, in run while self._runStep(): File "/gpfs/home/tuxl/software/genome/cactus-bin-v2.0.4/venv/lib/python3.6/site-packages/toil/batchSystems/abstractGridEngineBatchSystem.py", line 211, in _runStep activity |= self.createJobs(newJob) File "/gpfs/home/tuxl/software/genome/cactus-bin-v2.0.4/venv/lib/python3.6/site-packages/toil/batchSystems/abstractGridEngineBatchSystem.py", line 115, in createJobs batchJobID = self.boss.with_retries(self.submitJob, subLine) File "/gpfs/home/tuxl/software/genome/cactus-bin-v2.0.4/venv/lib/python3.6/site-packages/toil/batchSystems/abstractGridEngineBatchSystem.py", line 426, in with_retries return operation(*args, *kwargs) File "/gpfs/home/tuxl/software/genome/cactus-bin-v2.0.4/venv/lib/python3.6/site-packages/toil/batchSystems/torque.py", line 122, in submitJob return call_command(subLine) File "/gpfs/home/tuxl/software/genome/cactus-bin-v2.0.4/venv/lib/python3.6/site-packages/toil/lib/misc.py", line 62, in call_command encoding='utf-8', errors="replace", env=env) File "/gpfs/home/tuxl/software/anaconda3/envs/py36/lib/python3.6/subprocess.py", line 729, in init restore_signals, start_new_session) File "/gpfs/home/tuxl/software/anaconda3/envs/py36/lib/python3.6/subprocess.py", line 1254, in _execute_child errpipe_read, errpipe_write = os.pipe() OSError: [Errno 24] Too many open files Stopping real-time logging server. Joining real-time logging server thread. Workflow Progress 99%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████ | 1005/1014 (0 failures) [10h 02:30<05:24, 0.03 jobs/s] [2022-08-10T07:45:05+0800] [MainThread] [I] [toil.realtimeLogger] Stopping real-time logging server. [2022-08-10T07:45:05+0800] [MainThread] [I] [toil.realtimeLogger] Joining real-time logging server thread. Traceback (most recent call last): File "/gpfs/home/tuxl/software/genome/cactus-bin-v2.0.4/venv/bin/cactus-prepare-toil", line 8, in sys.exit(main_toil()) File "/gpfs/home/tuxl/software/genome/cactus-bin-v2.0.4/venv/lib/python3.6/site-packages/cactus/progressive/cactus_prepare.py", line 46, in main_toil return main(toil_mode=True) File "/gpfs/home/tuxl/software/genome/cactus-bin-v2.0.4/venv/lib/python3.6/site-packages/cactus/progressive/cactus_prepare.py", line 230, in main cactusPrepare(options, project) File "/gpfs/home/tuxl/software/genome/cactus-bin-v2.0.4/venv/lib/python3.6/site-packages/cactus/progressive/cactus_prepare.py", line 393, in cactusPrepare toil.restart() File "/gpfs/home/tuxl/software/genome/cactus-bin-v2.0.4/venv/lib/python3.6/site-packages/toil/common.py", line 874, in restart return self._runMainLoop(rootJobDescription) File "/gpfs/home/tuxl/software/genome/cactus-bin-v2.0.4/venv/lib/python3.6/site-packages/toil/common.py", line 1132, in _runMainLoop jobCache=self._jobCache).run() File "/gpfs/home/tuxl/software/genome/cactus-bin-v2.0.4/venv/lib/python3.6/site-packages/toil/leader.py", line 229, in run self.innerLoop() File "/gpfs/home/tuxl/software/genome/cactus-bin-v2.0.4/venv/lib/python3.6/site-packages/toil/leader.py", line 622, in innerLoop self.checkForDeadlocks() File "/gpfs/home/tuxl/software/genome/cactus-bin-v2.0.4/venv/lib/python3.6/site-packages/toil/leader.py", line 654, in checkForDeadlocks totalRunningJobs = len(self.batchSystem.getRunningBatchJobIDs()) File "/gpfs/home/tuxl/software/genome/cactus-bin-v2.0.4/venv/lib/python3.6/site-packages/toil/batchSystems/abstractGridEngineBatchSystem.py", line 371, in getRunningBatchJobIDs batchIds = self.with_retries(self.worker.getRunningJobIDs) File "/gpfs/home/tuxl/software/genome/cactus-bin-v2.0.4/venv/lib/python3.6/site-packages/toil/batchSystems/abstractGridEngineBatchSystem.py", line 426, in with_retries return operation(args, **kwargs) File "/gpfs/home/tuxl/software/genome/cactus-bin-v2.0.4/venv/lib/python3.6/site-packages/toil/batchSystems/torque.py", line 73, in getRunningJobIDs stdout = call_command(['qstat'] + jobids) File "/gpfs/home/tuxl/software/genome/cactus-bin-v2.0.4/venv/lib/python3.6/site-packages/toil/lib/misc.py", line 62, in call_command encoding='utf-8', errors="replace", env=env) File "/gpfs/home/tuxl/software/anaconda3/envs/py36/lib/python3.6/subprocess.py", line 729, in init restore_signals, start_new_session) File "/gpfs/home/tuxl/software/anaconda3/envs/py36/lib/python3.6/subprocess.py", line 1254, in _execute_child errpipe_read, errpipe_write = os.pipe() OSError: [Errno 24] Too many open files

glennhickey commented 2 years ago

Not sure if this is a Cactus issue. You may just have too many open files on your system? You can try looking at lsof to see where they are...