facebookarchive / fb.resnet.torch

Torch implementation of ResNet from http://arxiv.org/abs/1512.03385 and training scripts
Other
2.29k stars 663 forks source link

Error training! #153

Open ilichev-andrey opened 7 years ago

ilichev-andrey commented 7 years ago

Hello. I am using model cifar-10 for training custom dataset. My data:

{
  train : 
    {
      data : FloatTensor - size: 3000x3x96x96
      labels : FloatTensor - size: 3000
    }
  val : 
    {
      data : FloatTensor - size: 1000x3x96x96
      labels : FloatTensor - size: 1000
    }
}

I am change:

  1. 32 -> 96 https://github.com/facebook/fb.resnet.torch/blob/master/datasets/cifar10.lua#L48
  2. 32 -> 96 https://github.com/facebook/fb.resnet.torch/blob/master/models/init.lua#L44

I am change model: output - 7 classes (in my data 7 classes) My model:

nn.Sequential {
  [input -> (1) -> (2) -> (3) -> (4) -> (5) -> (6) -> (7) -> (8) -> (9) -> output]
  (1): cudnn.SpatialConvolution(3 -> 16, 3x3, 1,1, 1,1) without bias
  (2): nn.SpatialBatchNormalization (4D) (16)
  (3): cudnn.ReLU
  (4): nn.Sequential {
    [input -> (1) -> (2) -> (3) -> output]
    (1): nn.Sequential {
      [input -> (1) -> (2) -> (3) -> output]
      (1): nn.ConcatTable {
        input
          |`-> (1): nn.Sequential {
          |      [input -> (1) -> (2) -> (3) -> (4) -> (5) -> output]
          |      (1): cudnn.SpatialConvolution(16 -> 16, 3x3, 1,1, 1,1) without bias
          |      (2): nn.SpatialBatchNormalization (4D) (16)
          |      (3): cudnn.ReLU
          |      (4): cudnn.SpatialConvolution(16 -> 16, 3x3, 1,1, 1,1) without bias
          |      (5): nn.SpatialBatchNormalization (4D) (16)
          |    }
           `-> (2): nn.Identity
           ... -> output
      }
      (2): nn.CAddTable
      (3): cudnn.ReLU
    }
    (2): nn.Sequential {
      [input -> (1) -> (2) -> (3) -> output]
      (1): nn.ConcatTable {
        input
          |`-> (1): nn.Sequential {
          |      [input -> (1) -> (2) -> (3) -> (4) -> (5) -> output]
          |      (1): cudnn.SpatialConvolution(16 -> 16, 3x3, 1,1, 1,1) without bias
          |      (2): nn.SpatialBatchNormalization (4D) (16)
          |      (3): cudnn.ReLU
          |      (4): cudnn.SpatialConvolution(16 -> 16, 3x3, 1,1, 1,1) without bias
          |      (5): nn.SpatialBatchNormalization (4D) (16)
          |    }
           `-> (2): nn.Identity
           ... -> output
      }
      (2): nn.CAddTable
      (3): cudnn.ReLU
    }
    (3): nn.Sequential {
      [input -> (1) -> (2) -> (3) -> output]
      (1): nn.ConcatTable {
        input
          |`-> (1): nn.Sequential {
          |      [input -> (1) -> (2) -> (3) -> (4) -> (5) -> output]
          |      (1): cudnn.SpatialConvolution(16 -> 16, 3x3, 1,1, 1,1) without bias
          |      (2): nn.SpatialBatchNormalization (4D) (16)
          |      (3): cudnn.ReLU
          |      (4): cudnn.SpatialConvolution(16 -> 16, 3x3, 1,1, 1,1) without bias
          |      (5): nn.SpatialBatchNormalization (4D) (16)
          |    }
           `-> (2): nn.Identity
           ... -> output
      }
      (2): nn.CAddTable
      (3): cudnn.ReLU
    }
  }
  (5): nn.Sequential {
    [input -> (1) -> (2) -> (3) -> output]
    (1): nn.Sequential {
      [input -> (1) -> (2) -> (3) -> output]
      (1): nn.ConcatTable {
        input
          |`-> (1): nn.Sequential {
          |      [input -> (1) -> (2) -> (3) -> (4) -> (5) -> output]
          |      (1): cudnn.SpatialConvolution(16 -> 32, 3x3, 2,2, 1,1) without bias
          |      (2): nn.SpatialBatchNormalization (4D) (32)
          |      (3): cudnn.ReLU
          |      (4): cudnn.SpatialConvolution(32 -> 32, 3x3, 1,1, 1,1) without bias
          |      (5): nn.SpatialBatchNormalization (4D) (32)
          |    }
           `-> (2): nn.Sequential {
                 [input -> (1) -> (2) -> output]
                 (1): nn.SpatialAveragePooling(1x1, 2,2)
                 (2): nn.Concat {
                   input
                     |`-> (1): nn.Identity
                      `-> (2): nn.MulConstant
                      ... -> output
                 }
               }
           ... -> output
      }
      (2): nn.CAddTable
      (3): cudnn.ReLU
    }
    (2): nn.Sequential {
      [input -> (1) -> (2) -> (3) -> output]
      (1): nn.ConcatTable {
        input
          |`-> (1): nn.Sequential {
          |      [input -> (1) -> (2) -> (3) -> (4) -> (5) -> output]
          |      (1): cudnn.SpatialConvolution(32 -> 32, 3x3, 1,1, 1,1) without bias
          |      (2): nn.SpatialBatchNormalization (4D) (32)
          |      (3): cudnn.ReLU
          |      (4): cudnn.SpatialConvolution(32 -> 32, 3x3, 1,1, 1,1) without bias
          |      (5): nn.SpatialBatchNormalization (4D) (32)
          |    }
           `-> (2): nn.Identity
           ... -> output
      }
      (2): nn.CAddTable
      (3): cudnn.ReLU
    }
    (3): nn.Sequential {
      [input -> (1) -> (2) -> (3) -> output]
      (1): nn.ConcatTable {
        input
          |`-> (1): nn.Sequential {
          |      [input -> (1) -> (2) -> (3) -> (4) -> (5) -> output]
          |      (1): cudnn.SpatialConvolution(32 -> 32, 3x3, 1,1, 1,1) without bias
          |      (2): nn.SpatialBatchNormalization (4D) (32)
          |      (3): cudnn.ReLU
          |      (4): cudnn.SpatialConvolution(32 -> 32, 3x3, 1,1, 1,1) without bias
          |      (5): nn.SpatialBatchNormalization (4D) (32)
          |    }
           `-> (2): nn.Identity
           ... -> output
      }
      (2): nn.CAddTable
      (3): cudnn.ReLU
    }
  }
  (6): nn.Sequential {
    [input -> (1) -> (2) -> (3) -> output]
    (1): nn.Sequential {
      [input -> (1) -> (2) -> (3) -> output]
      (1): nn.ConcatTable {
        input
          |`-> (1): nn.Sequential {
          |      [input -> (1) -> (2) -> (3) -> (4) -> (5) -> output]
          |      (1): cudnn.SpatialConvolution(32 -> 64, 3x3, 2,2, 1,1) without bias
          |      (2): nn.SpatialBatchNormalization (4D) (64)
          |      (3): cudnn.ReLU
          |      (4): cudnn.SpatialConvolution(64 -> 64, 3x3, 1,1, 1,1) without bias
          |      (5): nn.SpatialBatchNormalization (4D) (64)
          |    }
           `-> (2): nn.Sequential {
                 [input -> (1) -> (2) -> output]
                 (1): nn.SpatialAveragePooling(1x1, 2,2)
                 (2): nn.Concat {
                   input
                     |`-> (1): nn.Identity
                      `-> (2): nn.MulConstant
                      ... -> output
                 }
               }
           ... -> output
      }
      (2): nn.CAddTable
      (3): cudnn.ReLU
    }
    (2): nn.Sequential {
      [input -> (1) -> (2) -> (3) -> output]
      (1): nn.ConcatTable {
        input
          |`-> (1): nn.Sequential {
          |      [input -> (1) -> (2) -> (3) -> (4) -> (5) -> output]
          |      (1): cudnn.SpatialConvolution(64 -> 64, 3x3, 1,1, 1,1) without bias
          |      (2): nn.SpatialBatchNormalization (4D) (64)
          |      (3): cudnn.ReLU
          |      (4): cudnn.SpatialConvolution(64 -> 64, 3x3, 1,1, 1,1) without bias
          |      (5): nn.SpatialBatchNormalization (4D) (64)
          |    }
           `-> (2): nn.Identity
           ... -> output
      }
      (2): nn.CAddTable
      (3): cudnn.ReLU
    }
    (3): nn.Sequential {
      [input -> (1) -> (2) -> (3) -> output]
      (1): nn.ConcatTable {
        input
          |`-> (1): nn.Sequential {
          |      [input -> (1) -> (2) -> (3) -> (4) -> (5) -> output]
          |      (1): cudnn.SpatialConvolution(64 -> 64, 3x3, 1,1, 1,1) without bias
          |      (2): nn.SpatialBatchNormalization (4D) (64)
          |      (3): cudnn.ReLU
          |      (4): cudnn.SpatialConvolution(64 -> 64, 3x3, 1,1, 1,1) without bias
          |      (5): nn.SpatialBatchNormalization (4D) (64)
          |    }
           `-> (2): nn.Identity
           ... -> output
      }
      (2): nn.CAddTable
      (3): cudnn.ReLU
    }
  }
  (7): cudnn.SpatialAveragePooling(8x8, 1,1)
  (8): nn.View(64)
  (9): nn.Linear(64 -> 7)
}

How me run training?

Error:

th main.lua -dataset cifar10 -batchSize 128 -depth 20 -shareGradInput true
=> Creating model from file: models/resnet.lua  
 | ResNet-20 CIFAR-10   
=> Training epoch # 1   

cudnnFindConvolutionForwardAlgorithm failed:    2    convDesc=[mode : CUDNN_CROSS_CORRELATION datatype : CUDNN_DATA_FLOAT] hash=-dimA128,32,48,48 -filtA64,32,3,3 128,64,24,24 -padA1,1 -convStrideA2,2 CUDNN_DATA_FLOAT    
/root/facedetect/torch/install/bin/luajit: .../facedetect/torch/install/share/lua/5.1/nn/Container.lua:66: 
In 6 module of nn.Sequential:
In 1 module of nn.Sequential:
In 1 module of nn.Sequential:
In 1 module of nn.ConcatTable:
In 1 module of nn.Sequential:
/root/facedetect/torch/install/share/lua/5.1/cudnn/find.lua:483: cudnnFindConvolutionForwardAlgorithm failed, sizes:  convDesc=[mode : CUDNN_CROSS_CORRELATION datatype : CUDNN_DATA_FLOAT] hash=-dimA128,32,48,48 -filtA64,32,3,3 128,64,24,24 -padA1,1 -convStrideA2,2 CUDNN_DATA_FLOAT
stack traceback:
    [C]: in function 'error'
    /root/facedetect/torch/install/share/lua/5.1/cudnn/find.lua:483: in function 'forwardAlgorithm'
    ...torch/install/share/lua/5.1/cudnn/SpatialConvolution.lua:190: in function <...torch/install/share/lua/5.1/cudnn/SpatialConvolution.lua:186>
    [C]: in function 'xpcall'
    .../facedetect/torch/install/share/lua/5.1/nn/Container.lua:63: in function 'rethrowErrors'
    ...facedetect/torch/install/share/lua/5.1/nn/Sequential.lua:44: in function <...facedetect/torch/install/share/lua/5.1/nn/Sequential.lua:41>
    [C]: in function 'xpcall'
    .../facedetect/torch/install/share/lua/5.1/nn/Container.lua:63: in function 'rethrowErrors'
    ...acedetect/torch/install/share/lua/5.1/nn/ConcatTable.lua:11: in function <...acedetect/torch/install/share/lua/5.1/nn/ConcatTable.lua:9>
    [C]: in function 'xpcall'
    ...
    .../facedetect/torch/install/share/lua/5.1/nn/Container.lua:63: in function 'rethrowErrors'
    ...facedetect/torch/install/share/lua/5.1/nn/Sequential.lua:44: in function <...facedetect/torch/install/share/lua/5.1/nn/Sequential.lua:41>
    [C]: in function 'xpcall'
    .../facedetect/torch/install/share/lua/5.1/nn/Container.lua:63: in function 'rethrowErrors'
    ...facedetect/torch/install/share/lua/5.1/nn/Sequential.lua:44: in function 'forward'
    ./train.lua:56: in function 'train'
    main.lua:51: in main chunk
    [C]: in function 'dofile'
    ...tect/torch/install/lib/luarocks/rocks/trepl/scm-1/bin/th:150: in main chunk
    [C]: at 0x00405e90

WARNING: If you see a stack trace below, it doesn't point to the place where this error occurred. Please use only the one above.
stack traceback:
    [C]: in function 'error'
    .../facedetect/torch/install/share/lua/5.1/nn/Container.lua:66: in function 'rethrowErrors'
    ...facedetect/torch/install/share/lua/5.1/nn/Sequential.lua:44: in function 'forward'
    ./train.lua:56: in function 'train'
    main.lua:51: in main chunk
    [C]: in function 'dofile'
    ...tect/torch/install/lib/luarocks/rocks/trepl/scm-1/bin/th:150: in main chunk
    [C]: at 0x00405e90

Help me please! I really need help.

ghost commented 7 years ago

maybe this can help you. https://github.com/allenai/XNOR-Net/issues/22