ComparativeGenomicsToolkit / cactus

Official home of genome aligner based upon notion of Cactus graphs
Other
503 stars 110 forks source link

terminate called after throwing an instance of 'thrust::system::detail::bad_alloc' #1165

Open chenzhao12 opened 1 year ago

chenzhao12 commented 1 year ago

[2023-09-20T09:48:26+0800] [MainThread] [I] [toil.worker] ---TOIL WORKER OUTPUT LOG--- [2023-09-20T09:48:26+0800] [MainThread] [I] [toil] Running Toil version 5.12.0-6d5a5b83b649cd8adf34a5cfe89e7690c95189d3 on host DellR940xa. [2023-09-20T09:48:26+0800] [MainThread] [I] [toil.worker] Working on job 'LastzRepeatMaskJob' kind-LastzRepeatMaskJob/instance-nn3x6u3s v1 [2023-09-20T09:48:26+0800] [MainThread] [I] [toil.worker] Loaded body Job('LastzRepeatMaskJob' kind-LastzRepeatMaskJob/instance-nn3x6u3s v1) from description 'LastzRepeatMaskJob' kind-LastzRepeatMaskJob/instance-nn3x6u3s v1 [2023-09-20T09:48:26+0800] [MainThread] [W] [toil.common] XDG_RUNTIME_DIR is set to nonexistent directory /home/temp; your environment may be out of spec! [2023-09-20T09:48:30+0800] [MainThread] [I] [cactus.shared.common] Running the command ['segalign_repeat_masker', '/tmp/c6bab939887a5cc38a7e16c696ee73f4/702e/4c16/tmp2m3ghv3q/Danio_rerio_0_0.tgt', '--lastz_interval=10000000', '--markend', '--neighbor_proportion', '0.2', '--M', '10', '--step=3', '--ambiguous=iupac,100,100', '--num_gpu', '2'] [2023-09-20T09:48:30+0800] [MainThread] [I] [toil-rt] 2023-09-20 09:48:30.814376: Running the command: "segalign_repeat_masker /tmp/c6bab939887a5cc38a7e16c696ee73f4/702e/4c16/tmp2m3ghv3q/Danio_rerio_0_0.tgt --lastz_interval=10000000 --markend --neighbor_proportion 0.2 --M 10 --step=3 --ambiguous=iupac,100,100 --num_gpu 2" [2023-09-20T09:48:32+0800] [MainThread] [W] [toil.fileStores.abstractFileStore] Failed job accessed files: [2023-09-20T09:48:32+0800] [MainThread] [W] [toil.fileStores.abstractFileStore] Downloaded file 'files/for-job/kind-PreprocessSequence/instance-snf732jw/cleanup/file-111138efc991451f85e30e9ea143a09a/tmpenr68g39.tmp' to path '/tmp/c6bab939887a5cc38a7e16c696ee73f4/702e/4c16/tmp2m3ghv3q/Danio_rerio_0.query' [2023-09-20T09:48:32+0800] [MainThread] [W] [toil.fileStores.abstractFileStore] Downloaded file 'files/for-job/kind-PreprocessSequence/instance-snf732jw/cleanup/file-111138efc991451f85e30e9ea143a09a/tmpenr68g39.tmp' to path '/tmp/c6bab939887a5cc38a7e16c696ee73f4/702e/4c16/tmp2m3ghv3q/Danio_rerio_0_0.tgt' Traceback (most recent call last): File "/home/cactus/cactus_env/lib/python3.8/site-packages/toil/worker.py", line 403, in workerScript job._runner(jobGraph=None, jobStore=jobStore, fileStore=fileStore, defer=defer) File "/home/cactus/cactus_env/lib/python3.8/site-packages/cactus/shared/common.py", line 958, in _runner super(RoundedJob, self)._runner(*args, jobStore=jobStore, File "/home/cactus/cactus_env/lib/python3.8/site-packages/toil/job.py", line 2774, in _runner returnValues = self._run(jobGraph=None, fileStore=fileStore) File "/home/cactus/cactus_env/lib/python3.8/site-packages/toil/job.py", line 2691, in _run return self.run(fileStore) File "/home/cactus/cactus_env/lib/python3.8/site-packages/cactus/preprocessor/lastzRepeatMasking/cactus_lastzRepeatMask.py", line 205, in run alignment = self.gpuRepeatMask(fileStore, targetFiles[0]) File "/home/cactus/cactus_env/lib/python3.8/site-packages/cactus/preprocessor/lastzRepeatMasking/cactus_lastzRepeatMask.py", line 130, in gpuRepeatMask segalign_messages = cactus_call(parameters=cmd, work_dir=self.work_dir, returnStdErr=True, gpus=self.repeatMaskOptions.gpu, File "/home/cactus/cactus_env/lib/python3.8/site-packages/cactus/shared/common.py", line 889, in cactus_call raise RuntimeError("{}Command {} exited {}: {}".format(sigill_msg, call, process.returncode, out)) RuntimeError: Command /usr/bin/time -f "CACTUS-LOGGED-MEMORY-IN-KB: %M" segalign_repeat_masker /tmp/c6bab939887a5cc38a7e16c696ee73f4/702e/4c16/tmp2m3ghv3q/Danio_rerio_0_0.tgt --lastz_interval=10000000 --markend --neighbor_proportion 0.2 --M 10 --step=3 --ambiguous=iupac,100,100 --num_gpu 2 exited 134: stderr=Using 192 threads Using 2 GPU(s) terminate called after throwing an instance of 'thrust::system::detail::bad_alloc' what(): std::bad_alloc: cudaErrorMemoryAllocation: out of memory Command terminated by signal 6 CACTUS-LOGGED-MEMORY-IN-KB: 124868

chenzhao12 commented 1 year ago

It’s because the gpu has insufficient memory