BVLC / caffe

Caffe: a fast open framework for deep learning.
http://caffe.berkeleyvision.org/
Other
34.04k stars 18.7k forks source link

Cannot do a second net.forward() with Input layer #4255

Open kevin-li opened 8 years ago

kevin-li commented 8 years ago

I have a forloop in which I load different data into the Inputlayer and runs forward. It crashes my Nvidia driver. I found that this net can only do a forward once, and on the second net.forward() it crashes, giving error == cudaSuccess (4 vs. 0)unspecified launch failure. But if I load the net again in the forloop, it works fine.

I did not have this problem before, not sure why this shows up now. I tried different versions of Nvidia driver and it's the same.

for i in fvs:
    batch_image = batch_images[i]
    net.blobs['data'].reshape(*batch_image.shape)
    net.blobs['data'].data[...] = batch_image-115
    output = net.forward() # crashes on the second iteration
    probs = output['prob'][:,1]
RSly commented 7 years ago

Hi @kevin-li, Did you solve this?

I have the same issue, what is the best way to run the same net on multiple incoming images? ps. I know about the batch option, but it needs the images to be already present

RSly commented 7 years ago

hi @shelhamer, any suggestion on this topic? thanks

kevin-w-li commented 7 years ago

I don't remember exactly, but I was using the Windows version. Try smaller batches, it worked for me for cudaSuccess(4 vs. 0). I think it was my CUDA problem and I had to reinstall it the hard way...

RSly commented 7 years ago

hi @kevin-w-li thanks for the comment. I use batch_size 1, I still get the error message Check failed: status == CUDNN_STATUS_SUCCESS (4 vs. 0) CUDNN_STATUS_INTERNAL_ERROR for some networks.

For example for fcn8s, in this case I should reload the caffe.Net(deploy_file,caffemodel, caffe.TEST) for each image to get it working.

for other nets such as alexnet it works nicely just by loading the net once and using it for multiple images!

if anyone has an idea, please comment

I am using ubuntu 14, and cuda 7.5+cudnn 5

RSly commented 7 years ago

As an update, I installed cuda 8.0 and libcudnn5-dev_5.1.10

I still get the same error ...

kevin-w-li commented 7 years ago

You said it works for AlexNet. Could you paste your net definition here? Or try reduce the net size...? Sorry I don't really know the problem.

RSly commented 7 years ago

Hi,

I double checked, this problem only exist for the latest version of NV-caffe and not for the BVLC/caffe

so I will report a bug on nv-caffe.

cheers!

RSly commented 7 years ago

problem solved for nv-caffe in here https://github.com/NVIDIA/caffe/issues/299

maybe this issue can be closed.