dmlc / cxxnet

move forward to https://github.com/dmlc/mxnet
Other
1.03k stars 414 forks source link

Error when training ImageNet #206

Closed DrustZ closed 9 years ago

DrustZ commented 9 years ago

When I'm training with "Inception-BN.conf", the training will terminate at 0th round, and the log is an illegal memory access was encountered\n an illegal memory access was encountered

And when I use "kaiming.conf", it stopped after initting all layers. The log is OpenCV Error: Assertion failed (0 <= roi.x && 0 <= roi.width && roi.x + roi.width <= m.cols && 0 <= roi.y && 0 <= roi.height && roi.y + roi.height <= m.rows) in Mat, file /build/buildd/opencv-2.4.8+dfsg1/modules/core/src/matrix.cpp, line 323 OpenCV Error: Assertion failed (0 <= roi.x && 0 <= roi.width && roi.x + roi.width <= m.cols && 0 <= roi.y && 0 <= roi.height && roi.y + roi.height <= m.rows) in Mat, file /build/buildd/opencv-2.4.8+dfsg1/modules/core/src/matrix.cpp, line 323 OpenCV Error: Assertion failed (0 <= roi.x && 0 <= roi.width && roi.x + roi.width <= m.cols && 0 <= roi.y && 0 <= roi.height && roi.y + roi.height <= m.rows) in Mat, file /build/buildd/opencv-2.4.8+dfsg1/modules/core/src/matrix.cpp, line 323 OpenCV Error: Assertion failed (0 <= roi.x && 0 <= roi.width && roi.x + roi.width <= m.cols && 0 <= roi.y && 0 <= roi.height && roi.y + roi.height <= m.rows) in Mat, file /build/buildd/opencv-2.4.8+dfsg1/modules/core/src/matrix.cpp, line 323 terminate called after throwing an instance of 'cv::Exception' terminate called recursively what(): /build/buildd/opencv-2.4.8+dfsg1/modules/core/src/matrix.cpp:323: error: (-215) 0 <= roi.x && 0 <= roi.width && roi.x + roi.width <= m.cols && 0 <= roi.y && 0 <= roi.height && roi.y + roi.height <= m.rows in function Mat

terminate called recursively terminate called recursively

winstywang commented 9 years ago

What is the size of your input image?

DrustZ commented 9 years ago

Here's the .conf files: Inception-BN.conf

data = train
iter = imgrec
#  image_list = "/media/DATA1/Imagenet/train_list_shuffle.lst"
  image_rec  = "/media/DATA1/Imagenet/train_shuffle.bin"
  image_mean = "models/mean_224.bin"
  rand_crop=1
  rand_mirror=1
  shuffle=1
iter = threadbuffer
iter = end

eval = val
iter = imgrec
# image_list = "/media/DATA1/Imagenet/val_list.lst"
  image_rec = "/media/DATA1/Imagenet/val.bin"
  image_mean = "models/mean_224.bin"
#no random crop and mirror in test
iter = end

netconfig = start

layer[0->0.1] = conv:conv_1
  kernel_size = 7
  nchannel = 64
  pad = 3
  stride = 2
layer[0.1->0.2] = batch_norm:bn_1
layer[0.2->1] = relu:relu_1

layer[1->2] = max_pooling:max_pool_1
  kernel_size = 3
  stride = 2

layer[2->2.1] = conv:conv_2_reduce
  kernel_size = 1
  nchannel = 64
  pad = 0
  stride = 1
layer[2.1->2.2] = batch_norm:bn_2_1
layer[2.2->3] = relu:relu_2_1

layer[3->3.1] = conv:conv_2
  kernel_size = 3
  nchannel = 192
  pad = 1
  stride = 1
layer[3.1->3.2] = batch_norm:bn_2
layer[3.2->4] = relu:relu_2

layer[4->5] = max_pooling:max_pool_2
  kernel_size = 3
  stride = 2
##### inception 3a #####
layer[5->6.1.0,6.2.0,6.3.0,6.4.0] = split:split_3a_split
## inception 1x1
layer[6.1.0->6.1.1] = conv:conv_3a_1x1
  kernel_size = 1
  nchannel = 64
  pad = 0
  stride = 1
layer[6.1.1->6.1.2] = batch_norm:bn_3a_1x1
layer[6.1.2->6.1.3] = relu:relu_3a_1x1
## inception 3x3
layer[6.2.0->6.2.1] = conv:conv_3a_3x3_reduce
  kernel_size = 1
  nchannel = 64
  pad = 0
  stride = 1
layer[6.2.1->6.2.2] = batch_norm:bn_3a_3x3_reduce
layer[6.2.2->6.2.3] = relu:relu_3a_3x3_reduce
layer[6.2.3->6.2.4] = conv:conv_3a_3x3
  kernel_size = 3
  nchannel = 64
  pad = 1
  stride = 1
layer[6.2.4->6.2.5] = batch_norm:bn_3a_3x3
layer[6.2.5->6.2.6] = relu:relu_3a_3x3
## inception double 3x3
layer[6.3.0->6.3.1] = conv:conv_3a_double_3x3_reduce
  kernel_size = 1
  nchannel = 64
  pad = 0
  stride = 1
layer[6.3.1->6.3.2] = batch_norm:bn_3a_double_3x3_reduce
layer[6.3.2->6.3.3] = relu:relu_3a_double_3x3_reduce
layer[6.3.3->6.3.4] = conv:conv_3a_double_3x3_0
  kernel_size = 3
  nchannel = 96
  pad = 1
  stride = 1
layer[6.3.4->6.3.5] = batch_norm:bn_3a_double_3x3_0
layer[6.3.5->6.3.6] = relu:relu_3a_double_3x3_0
layer[6.3.6->6.3.7] = conv:conv_3a_double_3x3_1
  kernel_size = 3
  nchannel = 96
  pad = 1
  stride = 1
layer[6.3.7->6.3.8] = batch_norm:bn_3a_double_3x3_1
layer[6.3.8->6.3.9] = relu:relu_3a_double_3x3_1
## inception proj
layer[6.4.0->6.4.1] = avg_pooling:avg_pool_3a_pool
  kernel_size = 3
  stride = 1
  pad = 1
layer[6.4.1->6.4.2] = conv:conv_3a_proj
  kernel_size = 1
  nchannel = 32
  pad = 0
  stride = 1
layer[6.4.2->6.4.3] = batch_norm:bn_3a_proj
layer[6.4.3->6.4.4] = relu:relu_3a_proj

layer[6.1.3,6.2.6,6.3.9,6.4.4->6] = ch_concat:ch_concat_3a_chconcat
##### inception 3b #####
layer[6->7.1.0,7.2.0,7.3.0,7.4.0] = split:split_3b_split
## inception 1x1
layer[7.1.0->7.1.1] = conv:conv_3b_1x1
  kernel_size = 1
  nchannel = 64
  pad = 0
  stride = 1
layer[7.1.1->7.1.2] = batch_norm:bn_3b_1x1
layer[7.1.2->7.1.3] = relu:relu_3b_1x1
## inception 3x3
layer[7.2.0->7.2.1] = conv:conv_3b_3x3_reduce
  kernel_size = 1
  nchannel = 64
  pad = 0
  stride = 1
layer[7.2.1->7.2.2] = batch_norm:bn_3b_3x3_reduce
layer[7.2.2->7.2.3] = relu:relu_3b_3x3_reduce
layer[7.2.3->7.2.4] = conv:conv_3b_3x3
  kernel_size = 3
  nchannel = 96
  pad = 1
  stride = 1
layer[7.2.4->7.2.5] = batch_norm:bn_3b_3x3
layer[7.2.5->7.2.6] = relu:relu_3b_3x3
## inception double 3x3
layer[7.3.0->7.3.1] = conv:conv_3b_double_3x3_reduce
  kernel_size = 1
  nchannel = 64
  pad = 0
  stride = 1
layer[7.3.1->7.3.2] = batch_norm:bn_3b_double_3x3_reduce
layer[7.3.2->7.3.3] = relu:relu_3b_double_3x3_reduce
layer[7.3.3->7.3.4] = conv:conv_3b_double_3x3_0
  kernel_size = 3
  nchannel = 96
  pad = 1
  stride = 1
layer[7.3.4->7.3.5] = batch_norm:bn_3b_double_3x3_0
layer[7.3.5->7.3.6] = relu:relu_3b_double_3x3_0
layer[7.3.6->7.3.7] = conv:conv_3b_double_3x3_1
  kernel_size = 3
  nchannel = 96
  pad = 1
  stride = 1
layer[7.3.7->7.3.8] = batch_norm:bn_3b_double_3x3_1
layer[7.3.8->7.3.9] = relu:relu_3b_double_3x3_1
## inception proj
layer[7.4.0->7.4.1] = avg_pooling:avg_pool_3b_pool
  kernel_size = 3
  stride = 1
  pad = 1
layer[7.4.1->7.4.2] = conv:conv_3b_proj
  kernel_size = 1
  nchannel = 64
  pad = 0
  stride = 1
layer[7.4.2->7.4.3] = batch_norm:bn_3b_proj
layer[7.4.3->7.4.4] = relu:relu_3b_proj

layer[7.1.3,7.2.6,7.3.9,7.4.4->7] = ch_concat:ch_concat_3b_chconcat
##### inception 3c #####
layer[7->8.2.0,8.3.0,8.4.0] = split:split_3c_split
## inception 3x3
layer[8.2.0->8.2.1] = conv:conv_3c_3x3_reduce
  kernel_size = 1
  nchannel = 128
  pad = 0
  stride = 1
layer[8.2.1->8.2.2] = batch_norm:bn_3c_3x3_reduce
layer[8.2.2->8.2.3] = relu:relu_3c_3x3_reduce
layer[8.2.3->8.2.4] = conv:conv_3c_3x3
  kernel_size = 3
  nchannel = 160
  pad = 1
  stride = 2
layer[8.2.4->8.2.5] = batch_norm:bn_3c_3x3
layer[8.2.5->8.2.6] = relu:relu_3c_3x3
## inception double 3x3
layer[8.3.0->8.3.1] = conv:conv_3c_double_3x3_reduce
  kernel_size = 1
  nchannel = 64
  pad = 0
  stride = 1
layer[8.3.1->8.3.2] = batch_norm:bn_3c_double_3x3_reduce
layer[8.3.2->8.3.3] = relu:relu_3c_double_3x3_reduce
layer[8.3.3->8.3.4] = conv:conv_3c_double_3x3_0
  kernel_size = 3
  nchannel = 96
  pad = 1
  stride = 1
layer[8.3.4->8.3.5] = batch_norm:bn_3c_double_3x3_0
layer[8.3.5->8.3.6] = relu:relu_3c_double_3x3_0
layer[8.3.6->8.3.7] = conv:conv_3c_double_3x3_1
  kernel_size = 3
  nchannel = 96
  pad = 1
  stride = 2
layer[8.3.7->8.3.8] = batch_norm:bn_3c_double_3x3_1
layer[8.3.8->8.3.9] = relu:relu_3c_double_3x3_1
## inception proj
layer[8.4.0->8.4.1] = max_pooling:max_pool_3c_pool
  kernel_size = 3
  stride = 2
layer[8.2.6,8.3.9,8.4.1->8] = ch_concat:ch_concat_3c_chconcat
##### inception 4a #####
layer[8->9.1.0,9.2.0,9.3.0,9.4.0] = split:split_4a_split
## inception 1x1
layer[9.1.0->9.1.1] = conv:conv_4a_1x1
  kernel_size = 1
  nchannel = 224
  pad = 0
  stride = 1
layer[9.1.1->9.1.2] = batch_norm:bn_4a_1x1
layer[9.1.2->9.1.3] = relu:relu_4a_1x1
## inception 3x3
layer[9.2.0->9.2.1] = conv:conv_4a_3x3_reduce
  kernel_size = 1
  nchannel = 64
  pad = 0
  stride = 1
layer[9.2.1->9.2.2] = batch_norm:bn_4a_3x3_reduce
layer[9.2.2->9.2.3] = relu:relu_4a_3x3_reduce
layer[9.2.3->9.2.4] = conv:conv_4a_3x3
  kernel_size = 3
  nchannel = 96
  pad = 1
  stride = 1
layer[9.2.4->9.2.5] = batch_norm:bn_4a_3x3
layer[9.2.5->9.2.6] = relu:relu_4a_3x3
## inception double 3x3
layer[9.3.0->9.3.1] = conv:conv_4a_double_3x3_reduce
  kernel_size = 1
  nchannel = 96
  pad = 0
  stride = 1
layer[9.3.1->9.3.2] = batch_norm:bn_4a_double_3x3_reduce
layer[9.3.2->9.3.3] = relu:relu_4a_double_3x3_reduce
layer[9.3.3->9.3.4] = conv:conv_4a_double_3x3_0
  kernel_size = 3
  nchannel = 128
  pad = 1
  stride = 1
layer[9.3.4->9.3.5] = batch_norm:bn_4a_double_3x3_0
layer[9.3.5->9.3.6] = relu:relu_4a_double_3x3_0
layer[9.3.6->9.3.7] = conv:conv_4a_double_3x3_1
  kernel_size = 3
  nchannel = 128
  pad = 1
  stride = 1
layer[9.3.7->9.3.8] = batch_norm:bn_4a_double_3x3_1
layer[9.3.8->9.3.9] = relu:relu_4a_double_3x3_1
## inception proj
layer[9.4.0->9.4.1] = avg_pooling:avg_pool_4a_pool
  kernel_size = 3
  stride = 1
  pad = 1
layer[9.4.1->9.4.2] = conv:conv_4a_proj
  kernel_size = 1
  nchannel = 128
  pad = 0
  stride = 1
layer[9.4.2->9.4.3] = batch_norm:bn_4a_proj
layer[9.4.3->9.4.4] = relu:relu_4a_proj

layer[9.1.3,9.2.6,9.3.9,9.4.4->9] = ch_concat:ch_concat_4a_chconcat
##### inception 4b #####
layer[9->10.1.0,10.2.0,10.3.0,10.4.0] = split:split_4b_split
## inception 1x1
layer[10.1.0->10.1.1] = conv:conv_4b_1x1
  kernel_size = 1
  nchannel = 192
  pad = 0
  stride = 1
layer[10.1.1->10.1.2] = batch_norm:bn_4b_1x1
layer[10.1.2->10.1.3] = relu:relu_4b_1x1
## inception 3x3
layer[10.2.0->10.2.1] = conv:conv_4b_3x3_reduce
  kernel_size = 1
  nchannel = 96
  pad = 0
  stride = 1
layer[10.2.1->10.2.2] = batch_norm:bn_4b_3x3_reduce
layer[10.2.2->10.2.3] = relu:relu_4b_3x3_reduce
layer[10.2.3->10.2.4] = conv:conv_4b_3x3
  kernel_size = 3
  nchannel = 128
  pad = 1
  stride = 1
layer[10.2.4->10.2.5] = batch_norm:bn_4b_3x3
layer[10.2.5->10.2.6] = relu:relu_4b_3x3
## inception double 3x3
layer[10.3.0->10.3.1] = conv:conv_4b_double_3x3_reduce
  kernel_size = 1
  nchannel = 96
  pad = 0
  stride = 1
layer[10.3.1->10.3.2] = batch_norm:bn_4b_double_3x3_reduce
layer[10.3.2->10.3.3] = relu:relu_4b_double_3x3_reduce
layer[10.3.3->10.3.4] = conv:conv_4b_double_3x3_0
  kernel_size = 3
  nchannel = 128
  pad = 1
  stride = 1
layer[10.3.4->10.3.5] = batch_norm:bn_4b_double_3x3_0
layer[10.3.5->10.3.6] = relu:relu_4b_double_3x3_0
layer[10.3.6->10.3.7] = conv:conv_4b_double_3x3_1
  kernel_size = 3
  nchannel = 128
  pad = 1
  stride = 1
layer[10.3.7->10.3.8] = batch_norm:bn_4b_double_3x3_1
layer[10.3.8->10.3.9] = relu:relu_4b_double_3x3_1
## inception proj
layer[10.4.0->10.4.1] = avg_pooling:avg_pool_4b_pool
  kernel_size = 3
  stride = 1
  pad = 1
layer[10.4.1->10.4.2] = conv:conv_4b_proj
  kernel_size = 1
  nchannel = 128
  pad = 0
  stride = 1
layer[10.4.2->10.4.3] = batch_norm:bn_4b_proj
layer[10.4.3->10.4.4] = relu:relu_4b_proj

layer[10.1.3,10.2.6,10.3.9,10.4.4->10] = ch_concat:ch_concat_4b_chconcat
##### inception 4c #####
layer[10->11.1.0,11.2.0,11.3.0,11.4.0] = split:split_4c_split
## inception 1x1
layer[11.1.0->11.1.1] = conv:conv_4c_1x1
  kernel_size = 1
  nchannel = 160
  pad = 0
  stride = 1
layer[11.1.1->11.1.2] = batch_norm:bn_4c_1x1
layer[11.1.2->11.1.3] = relu:relu_4c_1x1
## inception 3x3
layer[11.2.0->11.2.1] = conv:conv_4c_3x3_reduce
  kernel_size = 1
  nchannel = 128
  pad = 0
  stride = 1
layer[11.2.1->11.2.2] = batch_norm:bn_4c_3x3_reduce
layer[11.2.2->11.2.3] = relu:relu_4c_3x3_reduce
layer[11.2.3->11.2.4] = conv:conv_4c_3x3
  kernel_size = 3
  nchannel = 160
  pad = 1
  stride = 1
layer[11.2.4->11.2.5] = batch_norm:bn_4c_3x3
layer[11.2.5->11.2.6] = relu:relu_4c_3x3
## inception double 3x3
layer[11.3.0->11.3.1] = conv:conv_4c_double_3x3_reduce
  kernel_size = 1
  nchannel = 128
  pad = 0
  stride = 1
layer[11.3.1->11.3.2] = batch_norm:bn_4c_double_3x3_reduce
layer[11.3.2->11.3.3] = relu:relu_4c_double_3x3_reduce
layer[11.3.3->11.3.4] = conv:conv_4c_double_3x3_0
  kernel_size = 3
  nchannel = 160
  pad = 1
  stride = 1
layer[11.3.4->11.3.5] = batch_norm:bn_4c_double_3x3_0
layer[11.3.5->11.3.6] = relu:relu_4c_double_3x3_0
layer[11.3.6->11.3.7] = conv:conv_4c_double_3x3_1
  kernel_size = 3
  nchannel = 160
  pad = 1
  stride = 1
layer[11.3.7->11.3.8] = batch_norm:bn_4c_double_3x3_1
layer[11.3.8->11.3.9] = relu:relu_4c_double_3x3_1
## inception proj
layer[11.4.0->11.4.1] = avg_pooling:avg_pool_4c_pool
  kernel_size = 3
  stride = 1
  pad = 1
layer[11.4.1->11.4.2] = conv:conv_4c_proj
  kernel_size = 1
  nchannel = 128
  pad = 0
  stride = 1
layer[11.4.2->11.4.3] = batch_norm:bn_4c_proj
layer[11.4.3->11.4.4] = relu:relu_4c_proj

layer[11.1.3,11.2.6,11.3.9,11.4.4->11] = ch_concat:ch_concat_4c_chconcat
##### inception 4d #####
layer[11->12.1.0,12.2.0,12.3.0,12.4.0] = split:split_4d_split
## inception 1x1
layer[12.1.0->12.1.1] = conv:conv_4d_1x1
  kernel_size = 1
  nchannel = 96
  pad = 0
  stride = 1
layer[12.1.1->12.1.2] = batch_norm:bn_4d_1x1
layer[12.1.2->12.1.3] = relu:relu_4d_1x1
## inception 3x3
layer[12.2.0->12.2.1] = conv:conv_4d_3x3_reduce
  kernel_size = 1
  nchannel = 128
  pad = 0
  stride = 1
layer[12.2.1->12.2.2] = batch_norm:bn_4d_3x3_reduce
layer[12.2.2->12.2.3] = relu:relu_4d_3x3_reduce
layer[12.2.3->12.2.4] = conv:conv_4d_3x3
  kernel_size = 3
  nchannel = 192
  pad = 1
  stride = 1
layer[12.2.4->12.2.5] = batch_norm:bn_4d_3x3
layer[12.2.5->12.2.6] = relu:relu_4d_3x3
## inception double 3x3
layer[12.3.0->12.3.1] = conv:conv_4d_double_3x3_reduce
  kernel_size = 1
  nchannel = 160
  pad = 0
  stride = 1
layer[12.3.1->12.3.2] = batch_norm:bn_4d_double_3x3_reduce
layer[12.3.2->12.3.3] = relu:relu_4d_double_3x3_reduce
layer[12.3.3->12.3.4] = conv:conv_4d_double_3x3_0
  kernel_size = 3
  nchannel = 192
  pad = 1
  stride = 1
layer[12.3.4->12.3.5] = batch_norm:bn_4d_double_3x3_0
layer[12.3.5->12.3.6] = relu:relu_4d_double_3x3_0
layer[12.3.6->12.3.7] = conv:conv_4d_double_3x3_1
  kernel_size = 3
  nchannel = 192
  pad = 1
  stride = 1
layer[12.3.7->12.3.8] = batch_norm:bn_4d_double_3x3_1
layer[12.3.8->12.3.9] = relu:relu_4d_double_3x3_1
## inception proj
layer[12.4.0->12.4.1] = avg_pooling:avg_pool_4d_pool
  kernel_size = 3
  stride = 1
  pad = 1
layer[12.4.1->12.4.2] = conv:conv_4d_proj
  kernel_size = 1
  nchannel = 128
  pad = 0
  stride = 1
layer[12.4.2->12.4.3] = batch_norm:bn_4d_proj
layer[12.4.3->12.4.4] = relu:relu_4d_proj

layer[12.1.3,12.2.6,12.3.9,12.4.4->12] = ch_concat:ch_concat_4d_chconcat
##### inception 4e #####
layer[12->13.2.0,13.3.0,13.4.0] = split:split_4e_split
## inception 3x3
layer[13.2.0->13.2.1] = conv:conv_4e_3x3_reduce
  kernel_size = 1
  nchannel = 128
  pad = 0
  stride = 1
layer[13.2.1->13.2.2] = batch_norm:bn_4e_3x3_reduce
layer[13.2.2->13.2.3] = relu:relu_4e_3x3_reduce
layer[13.2.3->13.2.4] = conv:conv_4e_3x3
  kernel_size = 3
  nchannel = 192
  pad = 1
  stride = 2
layer[13.2.4->13.2.5] = batch_norm:bn_4e_3x3
layer[13.2.5->13.2.6] = relu:relu_4e_3x3
## inception double 3x3
layer[13.3.0->13.3.1] = conv:conv_4e_double_3x3_reduce
  kernel_size = 1
  nchannel = 192
  pad = 0
  stride = 1
layer[13.3.1->13.3.2] = batch_norm:bn_4e_double_3x3_reduce
layer[13.3.2->13.3.3] = relu:relu_4e_double_3x3_reduce
layer[13.3.3->13.3.4] = conv:conv_4e_double_3x3_0
  kernel_size = 3
  nchannel = 256
  pad = 1
  stride = 1
layer[13.3.4->13.3.5] = batch_norm:bn_4e_double_3x3_0
layer[13.3.5->13.3.6] = relu:relu_4e_double_3x3_0
layer[13.3.6->13.3.7] = conv:conv_4e_double_3x3_1
  kernel_size = 3
  nchannel = 256
  pad = 1
  stride = 2
layer[13.3.7->13.3.8] = batch_norm:bn_4e_double_3x3_1
layer[13.3.8->13.3.9] = relu:relu_4e_double_3x3_1
## inception proj
layer[13.4.0->13.4.1] = max_pooling:max_pool_4e_pool
  kernel_size = 3
  stride = 2
layer[13.2.6,13.3.9,13.4.1->13] = ch_concat:ch_concat_4e_chconcat
##### inception 5a #####
layer[13->14.1.0,14.2.0,14.3.0,14.4.0] = split:split_5a_split
## inception 1x1
layer[14.1.0->14.1.1] = conv:conv_5a_1x1
  kernel_size = 1
  nchannel = 352
  pad = 0
  stride = 1
layer[14.1.1->14.1.2] = batch_norm:bn_5a_1x1
layer[14.1.2->14.1.3] = relu:relu_5a_1x1
## inception 3x3
layer[14.2.0->14.2.1] = conv:conv_5a_3x3_reduce
  kernel_size = 1
  nchannel = 192
  pad = 0
  stride = 1
layer[14.2.1->14.2.2] = batch_norm:bn_5a_3x3_reduce
layer[14.2.2->14.2.3] = relu:relu_5a_3x3_reduce
layer[14.2.3->14.2.4] = conv:conv_5a_3x3
  kernel_size = 3
  nchannel = 320
  pad = 1
  stride = 1
layer[14.2.4->14.2.5] = batch_norm:bn_5a_3x3
layer[14.2.5->14.2.6] = relu:relu_5a_3x3
## inception double 3x3
layer[14.3.0->14.3.1] = conv:conv_5a_double_3x3_reduce
  kernel_size = 1
  nchannel = 160
  pad = 0
  stride = 1
layer[14.3.1->14.3.2] = batch_norm:bn_5a_double_3x3_reduce
layer[14.3.2->14.3.3] = relu:relu_5a_double_3x3_reduce
layer[14.3.3->14.3.4] = conv:conv_5a_double_3x3_0
  kernel_size = 3
  nchannel = 224
  pad = 1
  stride = 1
layer[14.3.4->14.3.5] = batch_norm:bn_5a_double_3x3_0
layer[14.3.5->14.3.6] = relu:relu_5a_double_3x3_0
layer[14.3.6->14.3.7] = conv:conv_5a_double_3x3_1
  kernel_size = 3
  nchannel = 224
  pad = 1
  stride = 1
layer[14.3.7->14.3.8] = batch_norm:bn_5a_double_3x3_1
layer[14.3.8->14.3.9] = relu:relu_5a_double_3x3_1
## inception proj
layer[14.4.0->14.4.1] = avg_pooling:avg_pool_5a_pool
  kernel_size = 3
  stride = 1
  pad = 1
layer[14.4.1->14.4.2] = conv:conv_5a_proj
  kernel_size = 1
  nchannel = 128
  pad = 0
  stride = 1
layer[14.4.2->14.4.3] = batch_norm:bn_5a_proj
layer[14.4.3->14.4.4] = relu:relu_5a_proj

layer[14.1.3,14.2.6,14.3.9,14.4.4->14] = ch_concat:ch_concat_5a_chconcat
##### inception 5b #####
layer[14->15.1.0,15.2.0,15.3.0,15.4.0] = split:split_5b_split
## inception 1x1
layer[15.1.0->15.1.1] = conv:conv_5b_1x1
  kernel_size = 1
  nchannel = 352
  pad = 0
  stride = 1
layer[15.1.1->15.1.2] = batch_norm:bn_5b_1x1
layer[15.1.2->15.1.3] = relu:relu_5b_1x1
## inception 3x3
layer[15.2.0->15.2.1] = conv:conv_5b_3x3_reduce
  kernel_size = 1
  nchannel = 192
  pad = 0
  stride = 1
layer[15.2.1->15.2.2] = batch_norm:bn_5b_3x3_reduce
layer[15.2.2->15.2.3] = relu:relu_5b_3x3_reduce
layer[15.2.3->15.2.4] = conv:conv_5b_3x3
  kernel_size = 3
  nchannel = 320
  pad = 1
  stride = 1
layer[15.2.4->15.2.5] = batch_norm:bn_5b_3x3
layer[15.2.5->15.2.6] = relu:relu_5b_3x3
## inception double 3x3
layer[15.3.0->15.3.1] = conv:conv_5b_double_3x3_reduce
  kernel_size = 1
  nchannel = 192
  pad = 0
  stride = 1
layer[15.3.1->15.3.2] = batch_norm:bn_5b_double_3x3_reduce
layer[15.3.2->15.3.3] = relu:relu_5b_double_3x3_reduce
layer[15.3.3->15.3.4] = conv:conv_5b_double_3x3_0
  kernel_size = 3
  nchannel = 224
  pad = 1
  stride = 1
layer[15.3.4->15.3.5] = batch_norm:bn_5b_double_3x3_0
layer[15.3.5->15.3.6] = relu:relu_5b_double_3x3_0
layer[15.3.6->15.3.7] = conv:conv_5b_double_3x3_1
  kernel_size = 3
  nchannel = 224
  pad = 1
  stride = 1
layer[15.3.7->15.3.8] = batch_norm:bn_5b_double_3x3_1
layer[15.3.8->15.3.9] = relu:relu_5b_double_3x3_1
## inception proj
layer[15.4.0->15.4.1] = max_pooling:max_pool_5b_pool
  kernel_size = 3
  stride = 1
  pad = 1
layer[15.4.1->15.4.2] = conv:conv_5b_proj
  kernel_size = 1
  nchannel = 128
  pad = 0
  stride = 1
layer[15.4.2->15.4.3] = batch_norm:bn_5b_proj
layer[15.4.3->15.4.4] = relu:relu_5b_proj

layer[15.1.3,15.2.6,15.3.9,15.4.4->15] = ch_concat:ch_concat_5b_chconcat

layer[15->16] = avg_pooling:global_pool
  kernel_size = 7
  stride = 1

layer[+1] = flatten:flatten

layer[+1] = fullc:fc
  nhidden = 1000

layer[+0] = softmax:softmax
netconfig = end

# evaluation metric
metric = rec@1
metric = rec@5

max_round = 100
num_round = 100

# input shape not including batch
input_shape = 3,224,224

batch_size = 64
update_period = 2

# global parameters in any sectiion outside netconfig, and iter
momentum = 0.9
wmat:lr  = 0.05
wmat:wd  = 0.0001

bias:wd  = 0.000
bias:lr  = 0.1

# all the learning rate schedule starts with lr
lr:schedule = constant 

save_model=1
model_dir=models
print_step=1
clip_gradient = 10
# random config
random_type = xavier
# new line
dev = gpu:0-3

kaiming.conf

# Configuration for ImageNet  
# Acknowledgement:
#  Ref: He, Kaiming, and Jian Sun. "Convolutional Neural Networks at Constrained Time Cost." CVPR2015
# J' model in the paper above

data = train
iter = imgrec
#  image_list = "/media/DATA1/Imagenet/train_list_shuffle.lst"
  image_rec  = "/media/DATA1/Imagenet/train_shuffle1.bin"
  image_mean = "models/kmean_224.bin"
  rand_crop=1
  rand_mirror=1
  min_crop_size=192
  max_crop_size=224
  max_aspect_ratio=0.3
iter = threadbuffer
iter = end

eval = val
iter = imgrec
#  image_list = "/media/DATA1/Imagenet/val_list.lst"
  image_rec = "/media/DATA1/Imagenet/val_shuffle.bin"
  image_mean = "models/kmean_224.bin"
# no random crop and mirror in test
iter = end

###### Stage 1 #######
netconfig=start
layer[0->1] = conv:conv1
  kernel_size = 7
  stride = 2
  nchannel = 64
layer[1->2] = relu:relu1
layer[2->3] = max_pooling
  kernel_size = 3

###### Stage 2 #######
layer[3->4] = conv:conv2
  nchannel = 128
  kernel_size = 2
  stride = 3
layer[4->5] = relu:relu2

layer[5->6] = conv:conv3
  nchannel = 128
  kernel_size = 2
  pad = 1
layer[6->7] = relu:relu3

layer[7->8] = conv:conv4
  nchannel = 128
  kernel_size = 2
layer[8->9] = relu:relu4

layer[9->10] = conv:conv5
  nchannel = 128
  kernel_size = 2
  pad = 1
layer[10->11] = relu:relu5

layer[11->12] = max_pooling:pool1
  kernel_size = 3

###### Stage 3 #######
layer[12->13] = conv:conv6
  nchannel = 256
  kernel_size = 2
  stride = 2
layer[13->14] = relu:relu6

layer[14->15] = conv:conv7
  nchannel = 256
  kernel_size = 2
  pad = 1
layer[15->16] = relu:relu7

layer[16->17] = conv:conv8
  nchannel = 256
  kernel_size = 2
layer[17->18] = relu:relu8

layer[18->19] = conv:conv9
  nchannel = 256
  kernel_size = 2
  pad = 1
layer[19->20] = relu:relu9

layer[20->21] = max_pooling:pool2
  kernel_size = 3

###### Stage 4 #######
layer[21->22] = conv:conv10
  nchannel = 2304
  kernel_size = 2
  stride = 3
layer[22->23] = relu:relu10

layer[23->24] = conv:conv11
  nchannel = 256
  kernel_size = 2
  pad = 1
layer[24->25] = relu:relu11

###### Stage 5 #######
layer[25->26,27,28,29] = split:split1
layer[26->30] = max_pooling:pool3
  kernel_size = 1
  stride = 1
layer[27->31] = max_pooling:pool4
  kernel_size = 2
  stride = 2
layer[28->32] = max_pooling:pool5
  kernel_size = 3
  stride = 3
layer[29->33] = max_pooling:pool6
  kernel_size = 6
  stride = 6

layer[30->34] = flatten:f1
layer[31->35] = flatten:f2
layer[32->36] = flatten:f3
layer[33->37] = flatten:f4
layer[34,35,36,37->38] = concat:concat1

###### Stage 6 #######
layer[38->39] = fullc:fc1
  nhidden = 4096
layer[39->40] = relu:relu12
layer[40->40] = dropout
  threshold = 0.5

layer[40->41] = fullc:fc2
  nhidden = 4096
layer[41->42] = relu:relu13
layer[42->42] = dropout
  threshold = 0.5

layer[42->43] = fullc:fc3
  nhidden = 1000
layer[43->43] = softmax:softmax1
netconfig=end

# evaluation metric
metric = rec@1
metric = rec@5

max_round = 100
num_round = 100

# input shape not including batch
input_shape = 3,224,224

batch_size = 128

# global parameters in any sectiion outside netconfig, and iter
momentum = 0.9
wmat:lr  = 0.01
wmat:wd  = 0.0005

bias:wd  = 0.000
bias:lr  = 0.02

# all the learning rate schedule starts with lr
lr:schedule = factor
lr:gamma = 0.1
lr:step = 300000

save_model=1
model_dir=models
print_step=1
# random config
random_type = xavier
# new line
dev = gpu:0-3
winstywang commented 9 years ago

No, I mean your input data.

DrustZ commented 9 years ago

Sorry I was editing the comment format just now. I used im2rec to make all the images resized 256*256 and the I used 2012 image resources.

antinucleon commented 9 years ago

Check your augmentation setting. On Tue, Jul 21, 2015 at 20:49 张明瑞 notifications@github.com wrote:

有没有人啊?在线等,挺急的

— Reply to this email directly or view it on GitHub https://github.com/dmlc/cxxnet/issues/206#issuecomment-123543360.

DrustZ commented 9 years ago

The config is the same as another one's, but with his cxxnet(executable), I can run the training successfully... I don't know what's the difference ...

DrustZ commented 9 years ago

we used nearly the same config.mk:

# choice of compiler
export CC = gcc
export CXX = g++
export NVCC = nvcc

# whether use CUDA during compile
USE_CUDA = 1

# add the path to CUDA libary to link and compile flag
# if you have already add them to enviroment variable, leave it as NONE
USE_CUDA_PATH = /usr/local/cuda

# whether use opencv during compilation
# you can disable it, however, you will not able to use
# imbin iterator
USE_OPENCV = 1
USE_OPENCV_DECODER = 1
# whether use CUDNN R3 library
USE_CUDNN = 1
# add the path to CUDNN libary to link and compile flag
# if you do not need that, or do not have that, leave it as NONE
USE_CUDNN_PATH = /home/mrzhang/Downloads/cudnn-6.5-linux-x64-v2
# whether to build caffe converter
USE_CAFFE_CONVERTER = 0
CAFFE_ROOT =
CAFFE_INCLUDE =
CAFFE_LIB =
#
# choose the version of blas you want to use
# can be: mkl, blas, atlas, openblas
USE_STATIC_MKL = /opt/intel/composer_xe_2015.0.090
USE_BLAS = mkl
#
# add path to intel libary, you may need it
# for MKL, if you did not add the path to enviroment variable
#
USE_INTEL_PATH = /opt/intel

# whether compile with parameter server
USE_DIST_PS = 1
PS_PATH = ./ps-lite
PS_THIRD_PATH = NONE

# whether compile with rabit
USE_RABIT_PS = 0
RABIT_PATH = /home/mrzhang/Downloads/rabit

# use openmp iterator
USE_OPENMP_ITER = 1
# the additional link flags you want to add
ADD_LDFLAGS = -ljpeg

# the additional compile flags you want to add
ADD_CFLAGS = -I /usr/local/cuda/bin
#
# If use MKL, choose static link automaticly to fix python wrapper
#
ifeq ($(USE_BLAS), mkl)
    USE_STATIC_MKL = 1
endif

#------------------------
# configuration for DMLC
#------------------------
# whether use HDFS support during compile
# this will allow cxxnet to directly save/load model from hdfs
USE_HDFS = 0

# whether use AWS S3 support during compile
# this will allow cxxnet to directly save/load model from s3
USE_S3 = 0

# path to libjvm.so
LIBJVM=$(JAVA_HOME)/jre/lib/amd64/server
USE_GLOG = 1
USE_GLOG = 1
DrustZ commented 9 years ago

Sorry it still can't work, but using other's bin seems to execute well. Maybe the Opencv has something wrong.

weihaoxie commented 9 years ago

When I run ../../bin/cxxnet bowl.conf, I also face the same problem.Can anyone solve this problem? Use CUDA Device 0: GeForce GTX 970 finish initialization with 1 devices Initializing layer: 0 Initializing layer: 1 Initializing layer: 2 Initializing layer: 3 Initializing layer: 4 Initializing layer: 5 Initializing layer: 6 Initializing layer: 7 Initializing layer: 8 Initializing layer: 9 Initializing layer: 10 Initializing layer: 11 Initializing layer: 12 Initializing layer: 13 Initializing layer: 14 Initializing layer: 15 Initializing layer: 16 SGDUpdater: eta=0.001000, mom=0.900000 SGDUpdater: eta=0.002000, mom=0.900000 SGDUpdater: eta=0.001000, mom=0.900000 SGDUpdater: eta=0.002000, mom=0.900000 SGDUpdater: eta=0.001000, mom=0.900000 SGDUpdater: eta=0.002000, mom=0.900000 SGDUpdater: eta=0.001000, mom=0.900000 SGDUpdater: eta=0.002000, mom=0.900000 SGDUpdater: eta=0.001000, mom=0.900000 SGDUpdater: eta=0.002000, mom=0.900000 SGDUpdater: eta=0.001000, mom=0.900000 SGDUpdater: eta=0.002000, mom=0.900000 SGDUpdater: eta=0.001000, mom=0.900000 SGDUpdater: eta=0.002000, mom=0.900000 node[in].shape: 64,3,40,40 node[!node-after-0].shape: 64,48,41,41 node[!node-after-1].shape: 64,48,41,41 node[!node-after-2].shape: 64,48,20,20 node[!node-after-3].shape: 64,96,20,20 node[!node-after-4].shape: 64,96,20,20 node[!node-after-5].shape: 64,96,20,20 node[!node-after-6].shape: 64,96,20,20 node[!node-after-7].shape: 64,96,10,10 node[!node-after-8].shape: 64,128,9,9 node[!node-after-9].shape: 64,128,9,9 node[!node-after-10].shape: 64,128,7,7 node[!node-after-11].shape: 64,128,3,3 node[!node-after-12].shape: 64,1,1,1152 node[!node-after-13].shape: 64,1,1,256 node[!node-after-14].shape: 64,1,1,121 [17:14:09] src/io/iter_image_recordio-inl.hpp:68: Loaded ImageList from /home/meitu/cxxnet-master/example/kaggle_bowl/tr.lst 20000 Image records cannot find /home/meitu/cxxnet-master/example/kaggle_bowl/models/image_mean.bin: create mean image, this will take some time... OpenCV Error: Assertion failed (0 <= roi.x && 0 <= roi.width && roi.x + roi.width <= m.cols && 0 <= roi.y && 0 <= roi.height && roi.y + roi.height <= m.rows) in Mat, file /build/buildd/opencv-2.4.8+dfsg1/modules/core/src/matrix.cpp, line 323 OpenCV Error: Assertion failed (0 <= roi.x && 0 <= roi.width && roi.x + roi.width <= m.cols && 0 <= roi.y && 0 <= roi.height && roi.y + roi.height <= m.rows) in Mat, file /build/buildd/opencv-2.4.8+dfsg1/modules/core/src/matrix.cpp, line 323 terminate called recursively terminate called after throwing an instance of 'cv::Exception' Aborted (core dumped)

superzrx commented 9 years ago

@weihaoxie can you paste head 5 lines of your tr.list? maybe there is a path problem

superzrx commented 9 years ago

@DrustZ please paste your code version as well as your friend`s :)

DrustZ commented 9 years ago

ours are the latest version

superzrx commented 9 years ago

@DrustZ That means your friend share same code version with you but his bin can works at your condition? OK ...

weihaoxie commented 9 years ago

the head 5 lines of tr.lst as follows. Is it wrong? 3406 10 data/train/chaetognath_non_sagitta/119995.jpg 22212 90 data/train/radiolarian_chain/89297.jpg 19772 83 data/train/protist_fuzzy_olive/48175.jpg 23435 98 data/train/siphonophore_calycophoran_rocketship_young/103178.jpg 18710 72 data/train/hydromedusae_solmaris/80365.jpg

DrustZ commented 9 years ago

well, it seems there's a hidden bug in cxxnet I rebuild the cxxnet and used different version OPENCV from 2.4.8 , 2.4.9 to 3.0.0, which proved vain. I tried generating different test datas, and every bin I used produced the same problem,

OpenCV Error: Assertion failed (0 <= roi.x && 0 <= roi.width && roi.x + roi.width <= m.cols && 0 <= roi.y && 0 <= roi.height && roi.y + roi.height <= m.rows) in Mat, file /build/buildd/opencv-2.4.8+dfsg1/modules/core/src/matrix.cpp, line 323
OpenCV Error: Assertion failed (0 <= roi.x && 0 <= roi.width && roi.x + roi.width <= m.cols && 0 <= roi.y && 0 <= roi.height && roi.y + roi.height <= m.rows) in Mat, file /build/buildd/opencv-2.4.8+dfsg1/modules/core/src/matrix.cpp, line 323
OpenCV Error: Assertion failed (0 <= roi.x && 0 <= roi.width && roi.x + roi.width <= m.cols && 0 <= roi.y && 0 <= roi.height && roi.y + roi.height <= m.rows) in Mat, file /build/buildd/opencv-2.4.8+dfsg1/modules/core/src/matrix.cpp, line 323
OpenCV Error: Assertion failed (0 <= roi.x && 0 <= roi.width && roi.x + roi.width <= m.cols && 0 <= roi.y && 0 <= roi.height && roi.y + roi.height <= m.rows) in Mat, file /build/buildd/opencv-2.4.8+dfsg1/modules/core/src/matrix.cpp, line 323

It's frustrating, which ruined my whole week trying to get over it , god.

DrustZ commented 9 years ago

FIND BUG:

In the lateset version, the io system seems somewhat crashed. Git Blame: commit fix rand_crop #197 in src/io/image_augmenter-inl.hpp : from line110 to 140.

Add: before merge files , please at least make the simplest test @superzrx

superzrx commented 9 years ago

@DrustZ Yes there is a bug when using min_crop_size or max_crop_size for there is a duplicated line.

res = res(roi);

Thank you.

superzrx commented 9 years ago

@weihaoxie just fix, maybe solved