Closed boleamol closed 8 years ago
Hm this is strange it is working fine on my end. Just as a test in the Network.lua class could you replace all cudnn.
withnn.
in the createSpeechNetwork() method and try running again? We can find out if it is just a cudnn problem or if there is something within the code.
As per your guidance I modified createSpeechNetwork() method and now it is running, but GPU memory is less so it is giving error "Training Epoch: 1 THCudaCheck FAIL file=/tmp/luarocks_cutorch-scm-1-2631/cutorch/lib/THC/generic/THCStorage.cu line=41 error=2 : out of memory lua: .../speech/torch/install/share/lua/5.1/nn/Container.lua:69: In 1 module of nn.Sequential: In 5 module of nn.Sequential: /home/speech/torch/install/share/lua/5.1/nn/THNN.lua:109: cuda runtime error (2) : out of memory at /tmp/luarocks_cutorch-scm-1-2631/cutorch/lib/THC/generic/THCStorage.cu:41 stack traceback: [C]: in function 'v' /home/speech/torch/install/share/lua/5.1/nn/THNN.lua:109: in function 'SpatialConvolutionMM_updateOutput' ...orch/install/share/lua/5.1/nn/SpatialConvolution.lua:104: in function <...orch/install/share/lua/5.1/nn/SpatialConvolution.lua:100> " I also observed memory usage its reached to 97%.. Now waiting for New GPU with high configuration... Anyhow Thanks for support ....
Ah so it is a cudnn issue, have you installed cudnn via the nvidia library (copying the .so files etc to the /usr/local/cuda install location, and adding to the ~/.bashrc?
And yeah because of the batching it might use a bit of memory, I'm running a GTX 970 with 4gb and it fits alright onto mem.
Hopefully in the coming weeks I completely redo the master branch with whats coming in the voxforge update branch which will allow the minibatch size to be customised (put a max minibatch size) which will reduce memory overhead.
Yes, I installed cudnn via nvidia library also copied .so files to the /usr/local/cuda install location, and added to the ~/.bashrc.. Then also issue was there.. Lets I will also try again.. If you reducing batch size then that is good for me.. Thank you..
Hopefully once I merge branches it will allow you to run the model on your PC, I'll close the issue for now!
Ok, fine Thank you sir...
CUDNN_STATUS_BAD_PARAM this issue can be solved by using cudnn 4 version and put it in LD_Library path
Hi, Thanks for your support up to now, We are simultaneously running on GPU also. We are using entry level NVIDIA GPU, Quadro K420, which is having 192 CUDA Cores and Total Memory 1024MB. I installed all the dependencies which is mentioned by you in README.md file. I am facing the following error. After this error also I checked the dependencies but no change.
"**Training Epoch: 1 lua: /root/torch/install/share/lua/5.1/nn/Container.lua:67: In 1 module of nn.Sequential: In 1 module of nn.Sequential: /root/torch/install/share/lua/5.1/cudnn/init.lua:58: Error in CuDNN: CUDNN_STATUS_BAD_PARAM (cudnnSetFilterNdDescriptor) stack traceback: C: in function 'error' /root/torch/install/share/lua/5.1/cudnn/init.lua:58: in function 'errcheck' ...h/install/share/lua/5.1/cudnn/SpatialConvolution.lua:45: in function 'resetWeightDescriptors' ...h/install/share/lua/5.1/cudnn/SpatialConvolution.lua:358: in function <...h/install/share/lua/5.1/cudnn/SpatialConvolution.lua:357> (tail call): ? C: in function 'xpcall' /root/torch/install/share/lua/5.1/nn/Container.lua:58: in function 'rethrowErrors' /root/torch/install/share/lua/5.1/nn/Sequential.lua:44: in function </root/torch/install/share/lua/5.1/nn/Sequential.lua:41> (tail call): ? C: in function 'xpcall' /root/torch/install/share/lua/5.1/nn/Container.lua:58: in function 'rethrowErrors' /root/torch/install/share/lua/5.1/nn/Sequential.lua:44: in function </root/torch/install/share/lua/5.1/nn/Sequential.lua:41> (tail call): ? ./Network.lua:95: in function 'opfunc' /root/torch/install/share/lua/5.1/optim/sgd.lua:44: in function 'sgd' ./Network.lua:111: in function 'trainNetwork' AN4CTCTrain.lua:40: in main chunk
Please support ...