PengNi / ccsmeth

Detecting DNA methylation from PacBio CCS reads
BSD 3-Clause Clear License
71 stars 10 forks source link

RuntimeError: cuDNN error: CUDNN_STATUS_EXECUTION_FAILED #35

Open rl4940 opened 1 year ago

rl4940 commented 1 year ago

Numpy那个应该是没有问题了,这次走到21% 了哈哈 但是出现了cuDNN的error, 这个好像是pytorch的报错,我确实不知道咋搞 ↓我的code

#!/bin/bash
#SBATCH --mail-type=END,FAIL 
#SBATCH --nodes=1
#SBATCH --ntasks=2
#SBATCH --cpus-per-task=2
#SBATCH --time=02:00:00
#SBATCH --mem=48G
#SBATCH --gres=gpu:a100:1
#SBATCH -o %A_%a_output.txt
#SBATCH -e %A_%a_error.txt

CUDA_VISIBLE_DEVICES=0 ccsmeth call_mods \
  --input 121A/mapped.bam \
  --ref 121A/assembly.rotated.polished.renamed.fsa \
  --model_file /ccsmeth/models/model_ccsmeth_5mCpG_call_mods_attbigru2s_b21.v2.ckpt \
  --output output.hifi.pbmm2.call_mods \
  --threads 10 --threads_call 2 --model_type attbigru2s \
  --rm_per_readsite --mode align 

↓ error.txt

batch_reader:  21%|██        | 1941/9340 [03:23<24:18,  5.07it/s]
batch_reader:  21%|██        | 1944/9340 [03:24<28:32,  4.32it/s]
batch_reader:  21%|██        | 1949/9340 [03:25<27:37,  4.46it/s]
batch_reader:  21%|██        | 1953/9340 [03:26<28:56,  4.25it/s]
batch_reader:  21%|██        | 1957/9340 [03:27<29:52,  4.12it/s]
batch_reader:  21%|██        | 1962/9340 [03:28<28:35,  4.30it/s]
batch_reader:  21%|██        | 1968/9340 [03:29<26:08,  4.70it/s]
batch_reader:  21%|██        | 1973/9340 [03:30<26:06,  4.70it/s]
batch_reader:  21%|██        | 1979/9340 [03:31<24:40,  4.97it/s]
batch_reader:  21%|██▏       | 1985/9340 [03:33<23:47,  5.15it/s]
batch_reader:  21%|██▏       | 1990/9340 [03:34<24:27,  5.01it/s]
batch_reader:  21%|██▏       | 1996/9340 [03:35<23:36,  5.18it/s]
batch_reader:  21%|██▏       | 2001/9340 [03:36<24:16,  5.04it/s]Process Process-6:
Process Process-4:
Traceback (most recent call last):
Traceback (most recent call last):
  File "/gpfs/data/pirontilab/Students/software/conda/envs/ccsmeth/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap
    self.run()
  File "/gpfs/data/pirontilab/Students/software/conda/envs/ccsmeth/lib/python3.10/multiprocessing/process.py", line 108, in run
    self._target(*self._args, **self._kwargs)
  File "/gpfs/data/pirontilab/Students/software/conda/envs/ccsmeth/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap
    self.run()
  File "/gpfs/data/pirontilab/Students/software/conda/envs/ccsmeth/lib/python3.10/site-packages/ccsmeth/call_modifications.py", line 340, in _call_mods_q
    pred_str, accuracy, batch_num = _call_mods2s(features_batch, model, args.batch_size, device)
  File "/gpfs/data/pirontilab/Students/software/conda/envs/ccsmeth/lib/python3.10/site-packages/ccsmeth/call_modifications.py", line 246, in _call_mods2s
    voutputs, vlogits = model(FloatTensor(b_fkmers, device), FloatTensor(b_fpasss, device),
  File "/gpfs/data/pirontilab/Students/software/conda/envs/ccsmeth/lib/python3.10/multiprocessing/process.py", line 108, in run
    self._target(*self._args, **self._kwargs)
  File "/gpfs/data/pirontilab/Students/software/conda/envs/ccsmeth/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl
    return forward_call(*input, **kwargs)
  File "/gpfs/data/pirontilab/Students/software/conda/envs/ccsmeth/lib/python3.10/site-packages/ccsmeth/call_modifications.py", line 340, in _call_mods_q
    pred_str, accuracy, batch_num = _call_mods2s(features_batch, model, args.batch_size, device)
  File "/gpfs/data/pirontilab/Students/software/conda/envs/ccsmeth/lib/python3.10/site-packages/ccsmeth/models.py", line 118, in forward
    out1, n_states1 = self.rnn(out1, self.init_hidden(out1.size(0),
  File "/gpfs/data/pirontilab/Students/software/conda/envs/ccsmeth/lib/python3.10/site-packages/ccsmeth/call_modifications.py", line 246, in _call_mods2s
    voutputs, vlogits = model(FloatTensor(b_fkmers, device), FloatTensor(b_fpasss, device),
  File "/gpfs/data/pirontilab/Students/software/conda/envs/ccsmeth/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl
    return forward_call(*input, **kwargs)
  File "/gpfs/data/pirontilab/Students/software/conda/envs/ccsmeth/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl
    return forward_call(*input, **kwargs)
  File "/gpfs/data/pirontilab/Students/software/conda/envs/ccsmeth/lib/python3.10/site-packages/torch/nn/modules/rnn.py", line 942, in forward
    result = _VF.gru(input, hx, self._flat_weights, self.bias, self.num_layers,
  File "/gpfs/data/pirontilab/Students/software/conda/envs/ccsmeth/lib/python3.10/site-packages/ccsmeth/models.py", line 118, in forward
    out1, n_states1 = self.rnn(out1, self.init_hidden(out1.size(0),
  File "/gpfs/data/pirontilab/Students/software/conda/envs/ccsmeth/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl
    return forward_call(*input, **kwargs)
RuntimeError: cuDNN error: CUDNN_STATUS_EXECUTION_FAILED
  File "/gpfs/data/pirontilab/Students/software/conda/envs/ccsmeth/lib/python3.10/site-packages/torch/nn/modules/rnn.py", line 942, in forward
    result = _VF.gru(input, hx, self._flat_weights, self.bias, self.num_layers,
RuntimeError: cuDNN error: CUDNN_STATUS_EXECUTION_FAILED

o(╥﹏╥)o

lanyunxin commented 1 year ago

跑这个CUDA_VISIBLE_DEVICES=0 ccsmeth call_mods需要多久啊 我设置的10个线程为什么把我的cpu直接占满了呢

rl4940 commented 1 year ago

跑这个CUDA_VISIBLE_DEVICES=0 ccsmeth call_mods需要多久啊 我设置的10个线程为什么把我的cpu直接占满了呢

我觉得优化有问题,机器学习优化应该是没搞好