could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR

xbcReal commented 7 years ago

I came across a problem when I trying to launch the commond "python ./faster_rcnn/train_net.py --gpu 0 --weights ./data/pretrain_model/VGG_imagenet.npy --imdb voc_2007_trainval --iters 70000 --cfg ./experiments/cfgs/faster_rcnn_end2end.yml --network VGGnet_train --restore 0 --set EXP_DIR exp_dir " to trian the pascal voc 2007 data. The problem said: E tensorflow/stream_executor/cuda/cuda_dnn.cc:397] could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR E tensorflow/stream_executor/cuda/cuda_dnn.cc:364] could not destroy cudnn handle: CUDNN_STATUS_BAD_PARAM F tensorflow/core/kernels/conv_ops.cc:605] Check failed: stream->parent()->GetConvolveAlgorithms(&algorithms) Aborted (core dumped)

Did anyone ever solve this problem ? Plz...

ChiefGodMan commented 7 years ago

I also meet this problem, the reason is that you have run some tf code on gpu and can not exit successfully, so you need to kill it. Following this tutorial:

List all gpu threads: $ nvidia-smi
Find the thread id which you have run and kill it, then everything is ok.

ashleylid commented 7 years ago

@ailias I wish that had worked. But not the case for me.

flifuehu commented 6 years ago

Run sudo rm -rf ~/.nv/ to fix it.

CharlesShang / TFFRCNN

could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR #97