NVIDIA / caffe

Caffe: a fast open framework for deep learning.
672 stars 263 forks source link

fix batchnorm with bottom dims less than 4 #506

Closed HisiFish closed 6 years ago

HisiFish commented 6 years ago

NVcaffe will crash while put a batchnorm layer after an inner product layer. However, the BVLC/caffe runs well on such situations. The following model is an example which can hit this bug. This commit tries to fix this bug.

name: "LeNet"
layer {
  name: "mnist"
  type: "Data"
  top: "data"
  top: "label"
  include {
    phase: TRAIN
  transform_param {
    scale: 0.00390625
  data_param {
    source: "mnist_train_lmdb"
    batch_size: 50
    backend: LMDB
layer {
  name: "mnist"
  type: "Data"
  top: "data"
  top: "label"
  include {
    phase: TEST
  transform_param {
    scale: 0.00390625
  data_param {
    source: "mnist_test_lmdb"
    batch_size: 100
    backend: LMDB

layer {
  name: "conv1"
  type: "Convolution"
  bottom: "data"
  top: "conv1"
  param {
    lr_mult: 1
  param {
    lr_mult: 2
  convolution_param {
    num_output: 32
    kernel_size: 5
    stride: 1
    weight_filler {
      type: "xavier"
    bias_filler {
      type: "constant"

layer {
    bottom: "conv1"
    top: "conv1"
    name: "conv1_bn"
    type: "BatchNorm"
    batch_norm_param {
        use_global_stats: false

layer {
    bottom: "conv1"
    top: "conv1"
    name: "conv1_scale"
    type: "Scale"
    scale_param {
        bias_term: true

layer {
    bottom: "conv1"
    top: "conv1"
    name: "conv1_relu"
    type: "ReLU"

layer {
  name: "pool1"
  type: "Pooling"
  bottom: "conv1"
  top: "pool1"
  pooling_param {
    pool: MAX
    kernel_size: 2
    stride: 2

layer {
  name: "conv2"
  type: "Convolution"
  bottom: "pool1"
  top: "conv2"
  param {
    lr_mult: 1
  param {
    lr_mult: 2
  convolution_param {
    num_output: 64
    kernel_size: 5
    stride: 1
    weight_filler {
      type: "xavier"
    bias_filler {
      type: "constant"

layer {
    bottom: "conv2"
    top: "conv2"
    name: "conv2_bn"
    type: "BatchNorm"
    batch_norm_param {
        use_global_stats: false

layer {
    bottom: "conv2"
    top: "conv2"
    name: "conv2_scale"
    type: "Scale"
    scale_param {
        bias_term: true

layer {
    bottom: "conv2"
    top: "conv2"
    name: "conv2_relu"
    type: "ReLU"

layer {
  name: "pool2"
  type: "Pooling"
  bottom: "conv2"
  top: "pool2"
  pooling_param {
    pool: MAX
    kernel_size: 2
    stride: 2
layer {
  name: "ip1"
  type: "InnerProduct"
  bottom: "pool2"
  top: "ip1"
  param {
    lr_mult: 1
  param {
    lr_mult: 2
  inner_product_param {
    num_output: 512
    weight_filler {
      type: "xavier"
    bias_filler {
      type: "constant"
layer {
    bottom: "ip1"
    top: "ip1"
    name: "ip1_bn"
    type: "BatchNorm"
    batch_norm_param {
        use_global_stats: false

layer {
    bottom: "ip1"
    top: "ip1"
    name: "ip1_scale"
    type: "Scale"
    scale_param {
        bias_term: true

layer {
    bottom: "ip1"
    top: "ip1"
    name: "ip1_relu"
    type: "ReLU"

layer {
  name: "ip2"
  type: "InnerProduct"
  bottom: "ip1"
  top: "ip2"
  param {
    lr_mult: 1
  param {
    lr_mult: 2
  inner_product_param {
    num_output: 10
    weight_filler {
      type: "xavier"
    bias_filler {
      type: "constant"
layer {
  name: "accuracy"
  type: "Accuracy"
  bottom: "ip2"
  bottom: "label"
  top: "accuracy"
  include {
    phase: TEST
layer {
  name: "loss"
  type: "SoftmaxWithLoss"
  bottom: "ip2"
  bottom: "label"
  top: "loss"
HisiFish commented 6 years ago

Oh, How to pass the ci build? It always goes time out. :(

drnikolaev commented 6 years ago

No worries about ci but could you clarify your use case please?

HisiFish commented 6 years ago

@drnikolaev Sometimes, we need batchnorm after an inner_product layer. The following Net is such an example. It works well with BVLC/caffe but crash with nvcaffe.

name: "LeNet"
layer {
  name: "mnist"
  type: "Data"
  top: "data"
  top: "label"
  include {
    phase: TRAIN
  transform_param {
    scale: 0.00390625
  data_param {
    source: "mnist_train_lmdb"
    batch_size: 50
    backend: LMDB
layer {
  name: "mnist"
  type: "Data"
  top: "data"
  top: "label"
  include {
    phase: TEST
  transform_param {
    scale: 0.00390625
  data_param {
    source: "mnist_test_lmdb"
    batch_size: 100
    backend: LMDB

layer {
  name: "conv1"
  type: "Convolution"
  bottom: "data"
  top: "conv1"
  param {
    lr_mult: 1
  param {
    lr_mult: 2
  convolution_param {
    num_output: 32
    kernel_size: 5
    stride: 1
    weight_filler {
      type: "xavier"
    bias_filler {
      type: "constant"

layer {
    bottom: "conv1"
    top: "conv1"
    name: "conv1_bn"
    type: "BatchNorm"
    batch_norm_param {
        use_global_stats: false

layer {
    bottom: "conv1"
    top: "conv1"
    name: "conv1_scale"
    type: "Scale"
    scale_param {
        bias_term: true

layer {
    bottom: "conv1"
    top: "conv1"
    name: "conv1_relu"
    type: "ReLU"

layer {
  name: "pool1"
  type: "Pooling"
  bottom: "conv1"
  top: "pool1"
  pooling_param {
    pool: MAX
    kernel_size: 2
    stride: 2

layer {
  name: "conv2"
  type: "Convolution"
  bottom: "pool1"
  top: "conv2"
  param {
    lr_mult: 1
  param {
    lr_mult: 2
  convolution_param {
    num_output: 64
    kernel_size: 5
    stride: 1
    weight_filler {
      type: "xavier"
    bias_filler {
      type: "constant"

layer {
    bottom: "conv2"
    top: "conv2"
    name: "conv2_bn"
    type: "BatchNorm"
    batch_norm_param {
        use_global_stats: false

layer {
    bottom: "conv2"
    top: "conv2"
    name: "conv2_scale"
    type: "Scale"
    scale_param {
        bias_term: true

layer {
    bottom: "conv2"
    top: "conv2"
    name: "conv2_relu"
    type: "ReLU"

layer {
  name: "pool2"
  type: "Pooling"
  bottom: "conv2"
  top: "pool2"
  pooling_param {
    pool: MAX
    kernel_size: 2
    stride: 2
layer {
  name: "ip1"
  type: "InnerProduct"
  bottom: "pool2"
  top: "ip1"
  param {
    lr_mult: 1
  param {
    lr_mult: 2
  inner_product_param {
    num_output: 512
    weight_filler {
      type: "xavier"
    bias_filler {
      type: "constant"
layer {
    bottom: "ip1"
    top: "ip1"
    name: "ip1_bn"
    type: "BatchNorm"
    batch_norm_param {
        use_global_stats: false

layer {
    bottom: "ip1"
    top: "ip1"
    name: "ip1_scale"
    type: "Scale"
    scale_param {
        bias_term: true

layer {
    bottom: "ip1"
    top: "ip1"
    name: "ip1_relu"
    type: "ReLU"

layer {
  name: "ip2"
  type: "InnerProduct"
  bottom: "ip1"
  top: "ip2"
  param {
    lr_mult: 1
  param {
    lr_mult: 2
  inner_product_param {
    num_output: 10
    weight_filler {
      type: "xavier"
    bias_filler {
      type: "constant"
layer {
  name: "accuracy"
  type: "Accuracy"
  bottom: "ip2"
  bottom: "label"
  top: "accuracy"
  include {
    phase: TEST
layer {
  name: "loss"
  type: "SoftmaxWithLoss"
  bottom: "ip2"
  bottom: "label"
  top: "loss"
drnikolaev commented 6 years ago

@HisiFish could you verify https://github.com/drnikolaev/caffe/tree/caffe-0.17 release candidate?

drnikolaev commented 6 years ago

@HisiFish Please verify https://github.com/NVIDIA/caffe/tree/v0.17.1 release and reopen the PR if needed. Please also sign and attach the CLA to accept it.