BVLC / caffe

Caffe: a fast open framework for deep learning.
http://caffe.berkeleyvision.org/
Other
34.01k stars 18.71k forks source link

Any input produce the same output #1396

Closed mender05 closed 9 years ago

mender05 commented 9 years ago

I try to ues caffe to implement the DeepPose proposed in this paper: http://arxiv.org/abs/1312.4659 DeepPose has 3 stages. And each stage is almost the same as AlexNet (DeepPose changes the loss layer in AlexNet to euclidean loss). It is a regression problem in fact.

The train.prototxt is:

name: "CaffeNet"
layers {
  name: "image"
  type: DATA
  top: "image"
  data_param {
    source: "examples/lsp/lsp_train_images_lmdb"
    backend: LMDB
    batch_size: 30
    scale: 0.00390625
  }
}
layers {
  name: "label"
  type: DATA
  top: "label"
  data_param {
    source: "examples/lsp/lsp_train_labels_lmdb"
    backend: LMDB
    batch_size: 30
    scale: 0.00454545
  }
}
layers {
  name: "conv1"
  type: CONVOLUTION
  bottom: "image"
  top: "conv1"
...  THIS IS THE SAME AS ALEXNET ...
layers {
  name: "fc8"
  type: INNER_PRODUCT
  bottom: "fc7"
  top: "fc8"
  blobs_lr: 1
  blobs_lr: 2
  weight_decay: 1
  weight_decay: 0
  inner_product_param {
    num_output: 28
    weight_filler {
      type: "gaussian"
      std: 0.01
    }
    bias_filler {
      type: "constant"
      value: 0
    }
  }
}
layers {
  name: "loss"
  type: EUCLIDEAN_LOSS
  bottom: "fc8"
  bottom: "label"
  top: "loss"
}

The solve.prototxt is:

net: "models/lsp/deeppose_train.prototxt"
base_lr: 0.001
lr_policy: "step"
gamma: 0.1
stepsize: 7500
display: 50
max_iter: 36500
momentum: 0.9
weight_decay: 0.0000005
snapshot: 2000
snapshot_prefix: "models/lsp/caffenet_train"
solver_mode: GPU

After trainning completed, I use python interface to do prediction on testset. The test.prototxt is:

name: "CaffeNet"
layers {
  name: "image"
  type: MEMORY_DATA
  top: "image"
    top: "useless"
  memory_data_param {
    batch_size: 30
    channels: 3
    height: 220
    width: 220
  }
}
layers {
  name: "conv1"
  type: CONVOLUTION
  bottom: "image"
... 
layers {
  name: "fc8"
  type: INNER_PRODUCT
  bottom: "fc7"
  top: "fc8"
  blobs_lr: 1
  blobs_lr: 2
  weight_decay: 1
  weight_decay: 0
  inner_product_param {
    num_output: 28
    weight_filler {
      type: "gaussian"
      std: 0.01
    }
    bias_filler {
      type: "constant"
      value: 0
    }
  }
}

but the output is very strange. Dumpping the output of "fc8" layer, I find that all the images produce the same output:

array([[ 0.48381898,  0.02326088,  0.02317634,  0.02317682,  0.48248914,
         0.01622555,  0.0161516 ,  0.01615119,  0.48646507,  0.03201264,
         0.03185751,  0.03185739,  0.52191395,  0.03508802,  0.03494693,
         0.03494673,  0.52380753,  0.01708153,  0.01701014,  0.01700996,
         0.52726734,  0.02286946,  0.02277863,  0.0227785 ,  0.46513146,
         0.02239206,  0.02227863,  0.02227836],
       [ 0.48381898,  0.02326088,  0.02317634,  0.02317682,  0.48248914,
         0.01622555,  0.0161516 ,  0.01615119,  0.48646507,  0.03201264,
         0.03185751,  0.03185739,  0.52191395,  0.03508802,  0.03494693,
         0.03494673,  0.52380753,  0.01708153,  0.01701014,  0.01700996,
         0.52726734,  0.02286946,  0.02277863,  0.0227785 ,  0.46513146,
         0.02239206,  0.02227863,  0.02227836],
       [ 0.48381898,  0.02326088,  0.02317634,  0.02317682,  0.48248914,
         0.01622555,  0.0161516 ,  0.01615119,  0.48646507,  0.03201264,
         0.03185751,  0.03185739,  0.52191395,  0.03508802,  0.03494693,
         0.03494673,  0.52380753,  0.01708153,  0.01701014,  0.01700996,
         0.52726734,  0.02286946,  0.02277863,  0.0227785 ,  0.46513146,
         0.02239206,  0.02227863,  0.02227836],

In fact, no mater what the inputs are, the outputs are always the same with the values above. How the problem caused?

jiangdong123 commented 9 years ago

Maybe you should take the fc8 features like this: net.blobs['fc8'].data[4].copy()

mender05 commented 9 years ago

@jiangdong123 Thank you for your advice! I'll try it. This is my previous code: I take the output of fc8 like this (just a BATCH_SIZE of test images):

net.set_input_arrays(\
    data4D.astype(np.float32), data4DL.astype(np.float32))
pred = net.forward()
for i in range(0,BATCH_SIZE):
  for c in range(0,28):
    pred_normal[i][c] = pred['fc8'][i][c][0][0]
print pred_normal

Is there any mistake?

mender05 commented 9 years ago

The loss variation looks very strange: figure_1 What causes the loss changing periodically?

sguada commented 9 years ago

Probably your training data is not randomized.

On Thursday, November 13, 2014, mender05 notifications@github.com wrote:

The loss variation looks very strange: [image: figure_1] https://cloud.githubusercontent.com/assets/7811449/5040242/bab9f878-6be6-11e4-99dc-675424c37cbd.png What causes the loss changing periodically?

— Reply to this email directly or view it on GitHub https://github.com/BVLC/caffe/issues/1396#issuecomment-63000773.

Sergio

mender05 commented 9 years ago

@jiangdong123

  1. The outputs always the same for all test images
  2. I have tried bigger learning rate (0.1) but get the same results.
  3. I also tried to change the input layer type from DATA to IMAGE_DATA, but the loss still changes periadically.
  4. I'll check if the labels are read properly.

As @sguada said, the training data is not randomized. I'll randomize them before training.

mender05 commented 9 years ago

@sguada @jiangdong123 The reason is that labels are interpreted inproperly! for example, the following 2 groups of 28-dim label:

label_1 = { 189, 116, 165, 259, 95, 144, 122, 151, 88, 125, 218, 160, 68, 32, 95, 110, 165, 266, 123, 32, 151, 182, 189, 284, 294, 218, 173, 157 }
label_2 = { 64, 71, 91, 115, 126, 105, 24, 51, 92, 144, 170, 197, 114, 132, 188, 138, 97, 103, 148, 201, 20, 29, 30, 39, 68, 99, 34, 22 }

are interpreted as:

label_1 = { 189, 0, 0, 0, 116, 0, 0, 0, 165, 0, 0, 0, 3, 1, 0, 0, 95, 0, 0, 0, 144, 0, 0, 0, 122, 0, 0, 0 }
label_2 = { 64, 0, 0, 0, 71, 0, 0, 0, 91, 0, 0, 0, 115, 0, 0, 0, 126, 0, 0, 0, 105, 0, 0, 0, 24, 0, 0, 0 }

It seems that CAFFE reads the input label by byte. As a result, the 259 is read as 3,1,0,0. In the little-endian machine, 259 is 3,1,0,0 in memory.

Previously, my labels are stored to lmdb in this way:

int datum_size = sizeof(int)*28;
data_file.read(str_buffer, datum_size);
...
datum.set_data(str_buffer, datum_size);
datum.SerializeToString(&value);
...
mdb_data.mv_data = reinterpret_cast<void*>(&value[0]);
mdb_put(mdb_txn, mdb_dbi, &mdb_key, &mdb_data, 0)

CAFFE uses c++ template: template <typename Dtype>. How can I specify the Dtype to be int?

mender05 commented 9 years ago

I have corrected the labels, changed the input type to float and randomlized the training samples, but this problem still there. figure_1 a period == 2400 iterations. A iteration processes 2400*30 = 72000 images. There are 22000 training images, which is equivalent to 72000/22000 = 3.3 epochs

sguada commented 9 years ago

When you shuffle the training data did you made sure the labels align?

Can you increase the batch size? Also try to increase the dropout.

On Sunday, November 23, 2014, mender05 notifications@github.com wrote:

I have corrected the labels, changed the input type to float and randomlized the training samples, but this problem still there. [image: figure_1] https://cloud.githubusercontent.com/assets/7811449/5161721/8ee3061a-73eb-11e4-8c93-48e7c7bee80d.jpg a period == 2400 iterations. A iteration processes 2400*30 = 72000 images. There are 22000 training images, which is equivalent to 72000/22000 = 3.3 epochs

— Reply to this email directly or view it on GitHub https://github.com/BVLC/caffe/issues/1396#issuecomment-64158231.

Sergio

mender05 commented 9 years ago

Thank you @sguada.

  1. The labels and images are shuffled synchronously.
  2. The limited VRAM restricts the batch size. 30 is the maximum size for me.
  3. I'll try to increase the dropout.
sguada commented 9 years ago

@mender05 you could also try https://github.com/shelhamer/caffe/tree/accum-grad to allow having bigger batch size by doing several iterations before updating the gradients.

mender05 commented 9 years ago

@sguada I have tried this branch. But what parameters should be set to enable bigger batch size? After batch_size was increased from 30 to 35, it ran out of memory.

F1126 16:43:02.337970  7332 syncedmem.cpp:51] Check failed: error == cudaSuc    cess (2 vs. 0)  out of memory
sguada commented 9 years ago

In the solver.prototxt add

iter_size: 2

That will mean that it would do 2 iterations of batch_size: 30 before updating the weights. This means that effectively you would using a batch_size: 60.

You can change your batch_size and iter_size to define the desired batch_size.

mender05 commented 9 years ago

It is so strange. As the batch_size increasing from 30 to 60, the loss variation pattern changed, but still periodic.

figure_1

sguada commented 9 years ago

There must be something weird with your data, the loss decrease very quickly and then oscillates with periodicity. Could you shuffle your data again?

Sergio

2014-11-28 3:34 GMT-08:00 mender05 notifications@github.com:

It is so strange. As the batch_size increasing from 30 to 60, the loss variation pattern changed, but still periodic.

[image: figure_1] https://cloud.githubusercontent.com/assets/7811449/5227188/dac48942-7732-11e4-8386-0cbd491695be.png

— Reply to this email directly or view it on GitHub https://github.com/BVLC/caffe/issues/1396#issuecomment-64884867.

mender05 commented 9 years ago

I use the snapshot at the 2000th iteration to predict, the outputs are all the same.

array([[ 0.49006659,  0.48892561,  0.49674234,  0.52244973,  0.52458155,
         0.52957731,  0.46845111,  0.47450158,  0.49067837,  0.52837992,
         0.53714836,  0.54056102,  0.52498746,  0.50657398,  0.53844237,
         0.5057267 ,  0.42278934,  0.42133904,  0.50450838,  0.5381543 ,
         0.45289528,  0.42029274,  0.37055418,  0.36709356,  0.41887969,
         0.44862145,  0.32116845,  0.36128747],
       [ 0.49006659,  0.48892561,  0.49674234,  0.52244973,  0.52458155,
         0.52957731,  0.46845111,  0.47450158,  0.49067837,  0.52837992,
         0.53714836,  0.54056102,  0.52498746,  0.50657398,  0.53844237,
         0.5057267 ,  0.42278934,  0.42133904,  0.50450838,  0.5381543 ,
         0.45289528,  0.42029274,  0.37055418,  0.36709356,  0.41887969,
         0.44862145,  0.32116845,  0.36128747],
       [ 0.49006659,  0.48892561,  0.49674234,  0.52244973,  0.52458155,
         0.52957731,  0.46845111,  0.47450158,  0.49067837,  0.52837992,
         0.53714836,  0.54056102,  0.52498746,  0.50657398,  0.53844237,
         0.5057267 ,  0.42278934,  0.42133904,  0.50450838,  0.5381543 ,
         0.45289528,  0.42029274,  0.37055418,  0.36709356,  0.41887969,
         0.44862145,  0.32116845,  0.36128747],

Outputs of the final model at the 36500th iteration:

array([[ 0.482418  ,  0.48542902,  0.49439543,  0.52315784,  0.52507049,
         0.52752018,  0.47199821,  0.47462174,  0.49217641,  0.52927047,
         0.54133612,  0.54410964,  0.52102458,  0.50839245,  0.53855455,
         0.5059635 ,  0.41948465,  0.4194364 ,  0.50593352,  0.53848571,
         0.44772175,  0.41696107,  0.36593205,  0.36593369,  0.41697961,
         0.44766867,  0.31933263,  0.36117038],
       [ 0.482418  ,  0.48542902,  0.49439543,  0.52315784,  0.52507049,
         0.52752018,  0.47199821,  0.47462174,  0.49217641,  0.52927047,
         0.54133612,  0.54410964,  0.52102458,  0.50839245,  0.53855455,
         0.5059635 ,  0.41948465,  0.4194364 ,  0.50593352,  0.53848571,
         0.44772175,  0.41696107,  0.36593205,  0.36593369,  0.41697961,
         0.44766867,  0.31933263,  0.36117038],
       [ 0.482418  ,  0.48542902,  0.49439543,  0.52315784,  0.52507049,
         0.52752018,  0.47199821,  0.47462174,  0.49217641,  0.52927047,
         0.54133612,  0.54410964,  0.52102458,  0.50839245,  0.53855455,
         0.5059635 ,  0.41948465,  0.4194364 ,  0.50593352,  0.53848571,
         0.44772175,  0.41696107,  0.36593205,  0.36593369,  0.41697961,
         0.44766867,  0.31933263,  0.36117038],
StevenLOL commented 9 years ago

hi, @mender05 do you mind to show some code on how you make prediction on test data and get the array in your last post?

mollahosseini commented 9 years ago

Hi, I have the same problem. I am using regression for video processing and therefore I used 9 consecutive frames as input of of the network. I changed convert_imageset.cpp to store data as 9 frames in each blob, reading data in train_val.prototxt as:

name: "CaffeNet"
layers {
  name: "data"
  type: DATA
  top: "data"
  top: "label"
  data_param {
    source: "examples/project/train_lmdb"
    backend: LMDB   
    batch_size: 256
  }
  transform_param {
    crop_size: 227
    mean_file: "examples/project//train_mean.binaryproto"
    mirror: true
  }
  include: { phase: TRAIN }
}
layers {
  name: "data"
  type: DATA
  top: "data"
  top: "label"
  data_param {
    source: "examples/project//val_lmdb"
    backend: LMDB   
    batch_size: 50
  }
  transform_param {
    crop_size: 227
    mean_file: "examples/project/train_mean.binaryproto"
    mirror: false
  }
  include: { phase: TEST }
}

and changed the accuracy layer to EUCLIDEAN_LOSS in train_val.prototxt

layers {
  name: "fc8"
  type: INNER_PRODUCT
  bottom: "fc7"
  top: "fc8"
  blobs_lr: 1
  blobs_lr: 2
  weight_decay: 1
  weight_decay: 0
  inner_product_param {
    num_output: 1
    weight_filler {
      type: "gaussian"
      std: 0.01
    }
    bias_filler {
      type: "constant"
      value: 0
    }
  }
}
layers {
  name: "loss"
  type: EUCLIDEAN_LOSS
  bottom: "fc8"
  bottom: "label"
  top: "loss"
}

for deploying I used:

input: "data"
input_dim: 10
input_dim: 9
input_dim: 227
input_dim: 227
layers {
  name: "conv1"
  type: CONVOLUTION
  bottom: "data"
  top: "conv1"
  convolution_param {
    num_output: 96
    kernel_size: 11
    stride: 4
  }
}

<rest the same>
layers {
  name: "fc8"
  type: INNER_PRODUCT
  bottom: "fc7"
  top: "fc8"
  inner_product_param {
    num_output: 1
  }
}

I have

base_lr: 0.001
batch_size: 256 for train
batch_size: 50 for val

The same is like image_net network. I have the same loss behavior as @mender05. It decreased dramatically at first and then fluctuated till end. I have not shuffled the data, and labels are integer 1 to 100. To test, I used Matlab interface, i.e. read 9 images, concatenate them together and use

scores = matcaffe_demo(imgFrames, 1);

As I am cropping the images, the result is a score vector with length of 10, all having the same value, e.g. 71.4674 regardless of input images. I also tried different snapshots of the network and the result changed a bit but still the same for all crops, all images.

@mender05, could you solve your problem? Do you still have the same output for all images?

@sguada, do I do every steps right for the regression? I am going to shuffle my data but I don't know if it is due to shuffling or something else!

sguada commented 9 years ago

A possible explanation is that the model is not learning much, it probably got trapped in a local minimum which is similar to random weights. Try to change the way you initialize the weights, change gaussian to xavier for the convolutional layers.

mollahosseini commented 9 years ago

Thanks @sguada,
I changed the weights from gaussian to xavier. But it gives me nan loss even with learning rate 0.001. I've read that people decreased lr to overcome nan loss, however I am afraid if I decrease lr more than 0.001 my network doesn't learn at all. I will work on shuffling data and see if it changes anything. BTW, I have about 2700 inputs (each has 9 images). considering 10 crops for each input, network is only trained with about 27000 inputs. Do you think it can be the reason of trapping in local minima?

sguada commented 9 years ago

Don't worry about decreasing the learning rate, it is relative to the magnitude of the loss, which in case of euclidean loss can be huge. And yes having only 9 images will cause problems of overfitting.

All questions about usage, installation, code, and applications should be searched for and asked on the caffe-users mailing list.

OnlySang commented 9 years ago

@mender05 have you ever solved this problem? I also meet it. I do not think its related to hyperparameters.

wizardcsy commented 9 years ago

@OnlySang I met the same problem recently while I used the AlexNet for training a 2-category classifier. When I used the model to test my images with python interface, I always got the same output. finelly, I set the outputnum of fc7 to 1000, and it became normal. I don't understand why absolutely, but hope it's useful to you!

mollahosseini commented 9 years ago

@OnlySang, I had the same problem. I decreased learning rate and shuffled data. The problem solved.

OnlySang commented 9 years ago

@wizardcsy binary classification using alexnet? I feel you make things complcated. small model may fit it.

OnlySang commented 9 years ago

@mollahosseini I tried what u have tried. but they didn't work. thanks for ur advices.

mender05 commented 9 years ago

@OnlySang I agree with you. I do not think its related to hyper parameters too.

@wizardcsy @mollahosseini I have not solved the problem. But after I changed the training and test dataset, this problem disappeared. According to my experience, it is difficult to directly regress a image to a pose vector. Besides, you may try a sampler network which is easier to train.

mender05 commented 9 years ago

@StevenLOL This is my prediction code:

###################
NUMBER = 1000
CHANNEL = 3
HEIGHT = 220
WIDTH = 220
###################
# read test image #
###################
...
# test[number,chanel,height,width]
...
#############################
# predict using caffe model #
#############################
# make sure that caffe is on the python path
CAFFE_ROOT = '/home/mender/caffe-master/'
import sys
sys.path.insert(0, CAFFE_ROOT + 'python')
import caffe

# set path to test model file and trained model
MODEL_FILE = './deeppose_test.prototxt'
TRAINED_MODEL = './caffenet_train_iter_36500.caffemodel'

net = caffe.Net(MODEL_FILE, TRAINED_MODEL)
#net.set_phase_test()
data4D = np.ones([1,CHANNEL,HEIGHT,WIDTH])
data4DL = np.zeros([1,14,1,1])
pred_normal = np.zeros([NUMBER,14])
n = 0
for n in range(0, NUMBER):
  data4D[0] = test[n]
  data4DL[0][0][0][0] = n
  net.set_input_arrays(\
      data4D.astype(np.float32), data4DL.astype(np.float32))
  pred = net.forward()
  for c in range(0,14):
    pred_normal[n][c] = pred['fc8'][0][c][0][0]
np.save('prediction_36500it.npy', pred_normal)
OnlySang commented 9 years ago

@mender05 When I use theano and Lasagne, which u can find them on github, the regression go to convergence. The main architecture of the network is the same, as well as training pipe. So why different realization make different results?

StevenLOL commented 9 years ago

Hi, @mender05 thank you for posting the code.

sjtujulian commented 9 years ago

@mender05 I have the same problem and I check the filter weight of middle layer. I turns out that the filter weights are all 0. Do you know why?

mtrth commented 8 years ago

Hi, I am also getting the same error filter weights are all 0 did you find a solution? @mender05 can u share your full train.prototxt

JoeMWatson commented 8 years ago

@mender05 did you ever find a solution? I'm having the same problem with periodicity and constant output...

wusongbeckham commented 8 years ago

could you share the code prepare the train data and test data? I also have the same problem. Thanks

wusongbeckham commented 8 years ago

@mender05 could you share the code prepare the train data and test data? I also have the same problem. Thanks

wusongbeckham commented 8 years ago

@mender05 have you successful implemented the deeppose? Could you share the code for data prepare?

zeakey commented 8 years ago

@sguada what do you mean by "Don't worry about decreasing the learning rate, it is relative to the magnitude of the loss" ? I find that smaller lr will lead to a better convergence under some condition, but theoretically small lr may make a local minimum, why not ?

ginobilinie commented 8 years ago

@mender05 Have you solved this problem? I'm doing regression with caffe, I suffered from the same problem as you. No matter what input is, the value is always one same value. The only possibility I can think is that the weights and bias of network is 0.

Anyone who solve this problem, please help.

kshalini commented 8 years ago

@mender05, @sguada

did you manage to solve the problem by modifying the protoxts (train & test)? if yes, can you please share them?

the Net seems similar to AlexNet but there are subtle variations and I run into problems that are mentioned earlier by others also. Effectively stuck! Any help would be greatly appreciated. thanks -:)

wqysq commented 8 years ago

I also have this issue when I use C++ command for predict.

JoeMWatson commented 8 years ago

@ginobilinie @kshalini

I've been doing regression from images and fixed the problem by scaling the pixel values down by 255 and subtracting the dataset mean (so the pixel values are now between {-1,1}), and also by scaling the labels down so they were between {0,1}. I also set all my new layer weights (I was transfer learning from AlexNet) to be initialized using the 'type:xavier' parameter.

Hope this helps!

kshalini commented 8 years ago

@JoeMWatson

thanks for the post. but didnt quite follow fully. my specific questions are: a) do you use LMDB or hdf5 for inputs? b) did you use the same train_val.prototxt as mentioned by @mender05 ? if not, can you please share yours for reference c) finally, can you also share the few lines of Python code to interpret the output labels you get from the net?

thanks

ginobilinie commented 8 years ago

@JoeMWatson

Thanks. In fact, I've already scale the label to [0,1], and the input image data to [-1,1], but I still found the predicted output value is the same. I analyze the trained model and the test data in each blob(do a forward pass), I found the bias dominates the output value, and the layer before the last layer usually goes to almost 0.

ginobilinie commented 8 years ago

Hi, I have solved my problem. In my case, the problem comes from the initialization of network. I change weight filler: gaussian->xavier, and set bias filler: constant 0. Then the problem is solved.

Venkatesh-Murthy commented 8 years ago

@ginobilinie Thanks, that did the trick.

hagg30 commented 8 years ago

another solver here. I recognized there is lack of non-linearity in my model. So, I add some more FC layer and dropout with ReLU activation function. then it did the better performance.

lood339 commented 8 years ago

@ginobilinie I have the same problem. The learned weight is zero everywhere and the output is constant. I guess the bias dominate the net. How do you solve this problem? ( I already change weight filler to xavier and set bias filler: constant 0). Shall I disable bias_term ?

ginobilinie commented 8 years ago

@lood339 In my case, when I set the bias to 0, then the training is normal... What about yours?

lood339 commented 8 years ago

@ginobilinie When I set eh bias to 0, it has the same problem. Then, I change the weight_decay to a small number (0.0005), then it is normal. I think the if the weight_decay is large (like 0.5), all the weight will eventually becomes zero in my case. I did another modification that helps me. I set the learning rate in convolutional layer as 0 because I transfer weights from pre-trained model. So that the weights in convolutional layer won't change during the training.

ginobilinie commented 8 years ago

@lood339 Good. I always set the weight_decay very small. If you just want to fine-tune some layers, you should set the learning rate of other layers to be 0.

joyousrabbit commented 7 years ago

@ginobilinie But why std:0.01 doesn't work? Why periodical loss?