PrincetonVision / marvin

Marvin: A Minimalist GPU-only N-Dimensional ConvNets Framework
MIT License
421 stars 137 forks source link

classfication demo error #33

Closed shuoyang129 closed 8 years ago

shuoyang129 commented 8 years ago

@danielsuo I use cuda7.5 and cudnn v5 and the latest marvin, while i am run the classification demo, i got the Segmentation fault, which the gdb says " 0x00007ffff2ace01d in cudnnDropoutGetStatesSize () from /usr/local/cudnn/v5/lib64/libcudnn.so.5", i have no idea how to fix it. Is there anything wrong? by the way, i got the right response when i run mnist demo,the following is what i do when i run the classification demo:

  1. install cuda7.5 by the .run file and get the graphic driver ok;
  2. download the cudnn-7.5-linux-x64-v5.0-rc.tgz unzip the "include and lib64" to "/usr/local/cudnn/v5"
  3. download marvin and compile
  4. download the classification model and data
  5. run the demo

    output is:

    Hello, World! This is Marvin. I am at a rough estimate thirty billion times more intelligent than you. Let me give you an example.

[New Thread 0x7ffff0598700 (LWP 17850)] [New Thread 0x7fffe7bff700 (LWP 17851)] MemoryDataLayer dataTrain loading data: 75.4819 MB name:image dim[4]={256,3,227,227} 0.5 KB name:label dim[4]={256,1,1,1} 301.928 KB name:imagenet1000 227x227x3 mean image dim[3]={3,227,227} MemoryDataLayer dataTest loading data: 75.4819 MB name:image dim[4]={256,3,227,227} 0.5 KB name:label dim[4]={256,1,1,1} 301.928 KB

name:imagenet1000 227x227x3 mean image dim[3]={3,227,227}

Layers: Responses:

dataTest data[4]={256,3,227,227} RF[1,1] GP[1,1] OF[0,0] label[4]={256,1,1,1} RF[1,1] GP[1,1] OF[0,0] conv1 weight[4]={96,3,11,11} bias[4]={1,96,1,1} conv1[4]={256,96,55,55} RF[11,11] GP[4,4] OF[0,0] relu1 norm1 norm1[4]={256,96,55,55} RF[11,11] GP[4,4] OF[0,0] pool1 pool1[4]={256,96,27,27} RF[19,19] GP[8,8] OF[0,0] conv2 (2 groups) weight[4]={256,48,5,5} bias[4]={1,256,1,1} conv2[4]={256,256,27,27} RF[51,51] GP[8,8] OF[-16,-16] relu2 norm2 norm2[4]={256,256,27,27} RF[51,51] GP[8,8] OF[-16,-16] pool2 pool2[4]={256,256,13,13} RF[67,67] GP[16,16] OF[-16,-16] conv3 weight[4]={384,256,3,3} bias[4]={1,384,1,1} conv3[4]={256,384,13,13} RF[99,99] GP[16,16] OF[-32,-32] relu3 conv4 (2 groups) weight[4]={384,192,3,3} bias[4]={1,384,1,1} conv4[4]={256,384,13,13} RF[131,131] GP[16,16] OF[-48,-48] relu4 conv5 (2 groups) weight[4]={256,192,3,3} bias[4]={1,256,1,1} conv5[4]={256,256,13,13} RF[163,163] GP[16,16] OF[-64,-64] relu5 pool5 pool5[4]={256,256,6,6} RF[195,195] GP[32,32] OF[-64,-64] fc6 weight[2]={4096,9216} bias[1]={4096} fc6[4]={256,4096,1,1} RF[355,355] GP[0,0] OF[0,0] relu6 drop6

Program received signal SIGSEGV, Segmentation fault.

0x00007ffff2ace01d in cudnnDropoutGetStatesSize () from /usr/local/cudnn/v5/lib64/libcudnn.so.5

shuoyang129 commented 8 years ago

I fix this by adding a line in the Malloc() function -->init(), the in.size() is 0 in constructor function,but it become to 1 in the Malloc() function, however, we only call init() in the constructor function, which resize some variables to 0;

danielsuo commented 8 years ago

We moved the resize function to the Malloc function and out of init.

Thanks for this!