Closed ankurhanda closed 9 years ago
Could you share all the prototxt files and the log file?
Probably the base_lr is too high. Try smaller 0.001, 0.0001, ...
On Sunday, August 3, 2014, Ankur Handa notifications@github.com wrote:
Hi all,
I have just modified caffe to run for dense pixel labelling for my semantic scene understanding task. I am testing that on a very simplistic network with only one convolution layer followed by ReLU and then softmax i.e.
data -> convolution ->ReLU -> softmax
The ground truth labelled image has 23 classes, therefore, I set the number of filters in convolution layer to only 23. The input is depth image of size 320x240 and the output of the network is 23 channel 320x240 images. The softmax loss layer then picks the appropriate label to compare that to the GT and computes the loss function which is summed over not just batch images but also on the number of pixels in the image. My training prototxt files looks like this:
==================== TRAIN PROTOTXT ============================== name: "VaFRIC_quick_train" layers { name: "VaFRIC" type: DATA top: "data" top: "label" data_param { source: "/home/workspace/code/AnnotationGT/" batch_size: 100 } } layers { name: "conv1" type: CONVOLUTION bottom: "data" top: "conv1" blobs_lr: 1 blobs_lr: 2 convolution_param { num_output: 23 pad: 2 kernel_size: 5 stride: 1 weight_filler { type: "gaussian" std: 0.0001 } bias_filler { type: "constant" } } } layers { name: "relu1" type: RELU bottom: "conv1" top: "conv1" } layers { name: "loss" type: SOFTMAX_LOSS bottom: "conv1" bottom: "label" }
However, I have been observing that the loss function only goes upwards and there is no decrease in the loss even when I change the learning rates and other variables that affect the SGDSolver. I modified the cifar10_solver.prototxt file to use the these new train and test files I created: reduce the learning rate after 8 epochs (4000 iters) by a factor of 10 The training protocol buffer definition
train_net: "VaFRIC_quick_train.prototxt" The testing protocol buffer definition
test_net: "VaFRIC_quick_test.prototxt" test_iter specifies how many forward passes the test should carry out. In the case of MNIST, we have test batch size 100 and 100 test iterations, covering the full 10,000 testing images.
test_iter: 1 Carry out testing every 500 training iterations.
test_interval: 500 The base learning rate, momentum and the weight decay of the network.
base_lr: 0.01 momentum: 0.9 weight_decay: 0.004 The learning rate policy
lr_policy: "fixed" gamma: 0.0001 power: 0.75 Display every 100 iterations
display: 100 The maximum number of iterations
max_iter: 4000 snapshot intermediate results
snapshot: 4000 snapshot_prefix: "VaFRIC_quick" solver mode: CPU or GPU
solver_mode: CPU
I have been using ~1200 images all labelled for training but I cannot quite figure out why the loss function is always increasing. My main questions are
(a) Is this too simple a network to converge? (b) Are there any other layers should I add to change the output? (c) Do I need more data? (d) How to verify that the changes I have made to obtain the dense output are indeed correct?
Being very new I'm unclear how to proceed from here. I'd be very grateful if I could get some reviews and feedback on this. Many thanks for your time and patience.
Kind Regards, Ankur.
— Reply to this email directly or view it on GitHub https://github.com/BVLC/caffe/issues/845.
Sergio
Hi Sergio,
Many thanks for your reply. Here are they:
=====VaFRIC_quick_train.prototxt====== name: "VaFRIC_quick_train" layers { name: "VaFRIC" type: DATA top: "data" top: "label" data_param { source: "/home/ankur/workspace/code/AnnotationGT/" batch_size: 100 } } layers { name: "conv1" type: CONVOLUTION bottom: "data" top: "conv1" blobs_lr: 1 blobs_lr: 2 convolution_param { num_output: 23 pad: 2 kernel_size: 5 stride: 1 weight_filler { type: "gaussian" std: 0.0001 } bias_filler { type: "constant" } } } layers { name: "relu1" type: RELU bottom: "conv1" top: "conv1" } layers { name: "loss" type: SOFTMAX_LOSS bottom: "conv1" bottom: "label" }
===== VaFRIC_quick_test.prototxt====== name: "VaFRIC_quick_test" layers { name: "VaFRIC" type: DATA top: "data" top: "label" data_param { source: "/home/ankur/workspace/code/AnnotationGT/" batch_size: 100 } } layers { name: "conv1" type: CONVOLUTION bottom: "data" top: "conv1" blobs_lr: 1 blobs_lr: 2 convolution_param { num_output: 23 pad: 2 kernel_size: 5 stride: 1 weight_filler { type: "gaussian" std: 0.0001 } bias_filler { type: "constant" } } } layers { name: "relu1" type: RELU bottom: "conv1" top: "conv1" } layers { name: "loss" type: SOFTMAX_LOSS bottom: "conv1" bottom: "label" top: "loss" }
====VaFRIC_quick_solver.prototxt=====
train_net: "VaFRIC_quick_train.prototxt"
test_net: "VaFRIC_quick_test.prototxt"
test_iter: 1
test_interval: 500
base_lr: 0.01 momentum: 0.9 weight_decay: 0.004
lr_policy: "fixed" gamma: 0.0001 power: 0.75
display: 100
max_iter: 4000
snapshot: 4000 snapshot_prefix: "VaFRIC_quick"
solver_mode: CPU
I have generally stopped the training after running for few iterations seeing that the loss function is only increasing. You might find it useful I don't know but this is my annotated synthetic data: http://tinyurl.com/l3qpbso. Since I'm only using softmax layer in this network, I've had to modify only this layer to get the dense outputs. My forward function (I call it Forward_cpu_dense) looks like this:
/// In softmax_layer.cpp
template <typename Dtype>
Dtype SoftmaxLayer<Dtype>::Forward_cpu_dense(const vector<Blob<Dtype>*>& bottom,
vector<Blob<Dtype>*>* top)
/// Figure out the max element from each bottom_data image of d dimensions.
for (int i = 0; i < num; i++)
{
for(int xx = 0 ; xx < width ; xx++ )
{
for(int yy = 0; yy < height ; yy++)
{
scale_data[i*width*height + xx + yy*width] = bottom_data[i*width*height*dim + yy*height + xx];
for(int j = 0 ; j < dim ; j++)
{
scale_data[i*width*height + xx + yy*width] = max(scale_data[i*width*height + xx + yy*width],
bottom_data[i*width*height*dim + j*height*width + yy*height + xx]);
}
}
}
}
LOG(INFO) <<"Obtaining the scale_data, max per pixel";
/// https://software.intel.com/sites/products/documentation/hpc/mkl/mklman/GUID-97718E5C-6E0A-44F0-B2B1-A551F0F164B2.htm
/// Subtract the maximum from each element in the channel
for(int batch_img = 0; batch_img < num; batch_img++ )
{
for(int d = 0; d < dim; d++ )
{
caffe_cpu_axpby<Dtype>(width*height,
(Dtype)-1.0,
scale_data + batch_img * width * height,
(Dtype)(1.0),
top_data
+ batch_img*width*height*dim
+ d*width*height);
}
}
LOG(INFO) <<"Subtracted the max per pixel";
/// Perform exponentiation: That possibly cannot go wrong..
caffe_exp<Dtype>(num * dim * width * height, top_data, top_data);
LOG(INFO) <<"Exponentiated the values";
/// sum after exp
for(int batch_img = 0; batch_img < num; batch_img++)
{
Dtype *sum_exp = new Dtype[width*height];
for(int d = 0; d < dim; d++ )
{
caffe_add<Dtype>(width*height,
top_data + d * width * height + batch_img * width * height * dim,
sum_exp,
sum_exp);
}
caffe_copy<Dtype>(width*height,
sum_exp,
scale_data + batch_img * width* height);
delete sum_exp;
}
LOG(INFO) <<"Sum after the exponentiation";
for(int batch_img = 0; batch_img < num; batch_img++ )
{
for(int d = 0; d < dim ; d++ )
{
caffe_div<Dtype>(width*height,
top_data + d * width * height + batch_img * width * height * dim,
scale_data + width * height * batch_img,
top_data + d * width * height + batch_img * width * height * dim);
}
}
LOG(INFO)<<"Division with the scale factor";
return Dtype(0);
/// in softmax_loss_layer.cpp
template <typename Dtype>
Dtype SoftmaxWithLossLayer<Dtype>::Forward_cpu_dense(const vector<Blob<Dtype>*>& bottom,
vector<Blob<Dtype>*>* top)
{
std::cout<<"Foward passing the softmax layer" << std::endl;
/// The forward pass computes the softmax prob values.
softmax_bottom_vec_[0] = bottom[0];
softmax_layer_->Forward(softmax_bottom_vec_, &softmax_top_vec_);
const Dtype* prob_data = prob_.cpu_data();
const Dtype* label = bottom[1]->cpu_data();
int num = prob_.num();
int dim = prob_.channels();
int width = prob_.width();
int height = prob_.height();
Dtype loss = 0;
/// Here I think num is the total number of images in a batch
LOG(INFO)<<"Going to compute the loss";
for (int i = 0; i < num; ++i)
{
/// We might need to change it to something like
for(int xx = 0 ; xx < width; xx++)
{
for(int yy = 0 ; yy < height ; yy++)
{
int idx = i*width*height + width * yy + xx;
loss += -log(max(prob_data[xx+width*yy+width*height*static_cast<int>(label[idx]) + i*width*height*dim],
Dtype(FLT_MIN)));
}
}
}
std::cout<<"Computed the total loss function value at the pass" << std::endl;
if (top->size() >= 1)
{
(*top)[0]->mutable_cpu_data()[0] = loss / (num * width * height );
}
if (top->size() == 2)
{
(*top)[1]->ShareData(prob_);
}
return loss / (num * width * height );
}
///in softmax_loss_layer.cpp
/// http://stats.stackexchange.com/questions/79454/softmax-layer-in-a-neural-network
template <typename Dtype>
void SoftmaxWithLossLayer<Dtype>::Backward_cpu_dense(
const vector<Blob<Dtype>*>& top,
const vector<bool>& propagate_down,
vector<Blob<Dtype>*>* bottom)
{
if (propagate_down[1])
{
LOG(FATAL) << this->type_name()
<< " Layer cannot backpropagate to label inputs.";
}
LOG(INFO)<<"Doing backward pass for the softmax layer with loss";
if (propagate_down[0])
{
LOG(INFO) <<"propagating down the derivatives";
Dtype* bottom_diff = (*bottom)[0]->mutable_cpu_diff();
const Dtype* prob_data = prob_.cpu_data();
caffe_copy(prob_.count(), prob_data, bottom_diff);
const Dtype* label = (*bottom)[1]->cpu_data();
int num = prob_.num();
int dim = prob_.channels();///prob_.count() / num;
int width = prob_.width();
int height = prob_.height();
LOG(INFO)<<"bottom_diff.num = " << (*bottom)[0]->num();
LOG(INFO)<<"bottom_diff.channels = " << (*bottom)[0]->channels();
LOG(INFO)<<"bottom_diff.width = " << (*bottom)[0]->width();
LOG(INFO)<<"bottom_diff.height = " << (*bottom)[0]->height();
/// Subtract from only that dimension where you computed the loss from..
for (int i = 0; i < num; ++i)
{
for(int xx = 0 ; xx < width ; xx++)
{
for(int yy = 0 ; yy < height ; yy++)
{
int idx = xx + yy * width + i * width * height;
bottom_diff[i * dim * height * width + xx + width*yy + height * width * static_cast<int>(label[idx])] -= 1;
}
}
}
/// Scale down gradient
caffe_scal(prob_.count(), Dtype(1) / (num * width * height ), bottom_diff);
}
LOG(INFO) <<"Finished the backward pass for the layer";
}
I have looked at these functions and I cannot quite think of any bug there. I suspect it's mostly that I am new so I haven't quite got the hang of many other parameters that affect the training process.
Kind Regards and many thanks again, Ankur.
As Sergio suggested, tune the learning rate a little bit:
(1) set learning rate to 0, and run. Loss should stay about the same, and if there is a sudden jump or NaN, sanity check your data. (2) set learning rate at exponentially increasing rates, like 1e-6, 1e-5, 1e-4... (or start even lower if 1e-6 still gives you increasing loss). Select the largest learning rate that does not blow up training.
I have tried two things:
(a) Adding another layer to the network i.e. data->conv->ReLU->conv->ReLU->softmax
(b) changing the learning parameter
Although this has improved the convergence properties (learning rate set to 1e-3) but I see after 360 iterations the loss function again starts to shoot up leading to a high value resulting in NaNs and infs again. The iterations vs loss graph is plotted below
It is tricky to figure out whether the changes I've made to get dense labelling are indeed correct - it seems hard to debug such changes. Is there anything in particular you can suggest and help me figure out why the system diverges? By adding another layer I may have just post-poned the divergence.
Kind Regards,
Keep trying smaller learning rates until start converging.
On Tuesday, August 5, 2014, Ankur Handa notifications@github.com wrote:
I have tried two things:
(a) Adding another layer to the network i.e. data->conv->ReLU->conv->ReLU->softmax
(b) changing the learning parameter
Although this has improved the convergence properties (learning rate set to 1e-3) but I see after 360 iterations the loss function again starts to shoot up leading to a high value resulting in NaNs and infs again. The iterations vs loss graph is plotted below
[image: test] https://cloud.githubusercontent.com/assets/686480/3809305/2c41ac20-1c82-11e4-87ee-4f9b641a98ff.png
It is tricky to figure out whether the changes I've made to get dense labelling are indeed correct - it seems hard to debug such changes. Is there anything in particular you can suggest and help me figure out why the system diverges? By adding another layer I may have just post-poned the divergence.
Kind Regards,
— Reply to this email directly or view it on GitHub https://github.com/BVLC/caffe/issues/845#issuecomment-51174055.
Sergio
I feel there is something more than just the learning rate that is affecting the optimisation. I wonder if you guys had a chance to try the depth-maps instead of images to your networks? There might be some other tricks that I should apply before I feed the data to the network. I have been throwing the raw data as it is to the network without any pre-processing.
The learning rates 1e-4 and 1e-3 have given the following plots espectively
I should also double check with units test first whether I the changes I made are indeed correct. I had reduced the image size from 320x240 to 80x60 to reduce the training time but I think it hasn't really changed the loss profiles that much. Let me know if you guys have known some pre-processing tricks to the data or some other special things to take into account when depth-data is used in the network.
Regards, Ankur.
Have you write a unit test function testing if your softmax_dense works appropriately? I have modified the test_softmax_layer.cpp to let it test for multiple height and width. You may need to test your function before doing the training. :)
// Copyright 2014 BVLC and contributors.
namespace caffe {
extern cudaDeviceProp CAFFE_TEST_CUDA_PROP;
template
typedef ::testing::Types<float, double> Dtypes; TYPED_TEST_CASE(SoftmaxLayerTest, Dtypes);
TYPED_TEST(SoftmaxLayerTest, TestForwardCPU) {
LayerParameter layer_param;
Caffe::set_mode(Caffe::CPU);
SoftmaxLayer
TYPED_TEST(SoftmaxLayerTest, TestForwardGPU) {
LayerParameter layer_param;
Caffe::set_mode(Caffe::GPU);
SoftmaxLayer
TYPED_TEST(SoftmaxLayerTest, TestGradientCPU) {
LayerParameter layer_param;
Caffe::set_mode(Caffe::CPU);
SoftmaxLayer
TYPED_TEST(SoftmaxLayerTest, TestGradientGPU) {
LayerParameter layer_param;
Caffe::set_mode(Caffe::GPU);
SoftmaxLayer
} // namespace caffe
Hi Yi,
Thanks for this. I went back and checked the unit test directory which I had completely forgotten about and eventually tried your code and figured out that there was tiny bug in the changes
I had declared Dtype* sum = new Dtype[width*height] and naively assumed that the initial values would be zeros, which didn't appear to be the case. I also discovered that if I increase the number of classes to about 50 the unit test with float fails due to float overflow and that only the test with double passes.
Thank you very much for your patience and kindness in helping me out on that. I have since re-run my code on the data and now hope to see convergence.
Best Regards, Ankur.
Glad it helps! Good luck with your experiments, :)
Another quick question. I have saved the snapshots of my training. Now I'd like to be able to see what kind of filters did it learn? I have seen this webpage http://nbviewer.ipython.org/github/BVLC/caffe/blob/master/examples/filter_visualization.ipynb for filter visualisation. However, given that my output is dense labelled image, my program crashes when I call net = caffe.Classifier(..).
I wonder if there is easy way to store the filters using c++ code. One can view them in python later on. Please let me know if you have any suggestions on that.
Can you give the error message for the crash? The Python interface can load fully-convolutional / dense models just fine at least with the caffe.Net class.
On Monday, August 18, 2014, Ankur Handa notifications@github.com wrote:
Another quick question. I have saved the snapshots of my training. Now I'd like to be able to see what kind of filters did it learn? I have seen this webpage http://nbviewer.ipython.org/github/BVLC/caffe/blob/master/examples/filter_visualization.ipynb for filter visualisation. However, given that my output is dense labelled image, my program crashes when I call net = caffe.Classifier(..).
I wonder if there is easy way to store the filters using c++ code. One can view them in python later on. Please let me know if you have any suggestions on that.
— Reply to this email directly or view it on GitHub https://github.com/BVLC/caffe/issues/845#issuecomment-52551215.
Evan Shelhamer
It is a segmentation fault. So I quickly went back to test_net.cpp and ran that and it highlighted the segfault. I was checking for accuracy of CIFAR10 example given with the following code.
double test_accuracy = 0;
for (int i = 0; i < total_iter; ++i)
{
const vector<Blob
The program gives seg fault and I have found out that it returns fine from ForwardPrefilled. If you add a LOG(ERROR) print statement it prints but the next line where it is summing up the test_accuracy it crashes.
It turns out that result has size 0.
Hey @ankurhanda You said you feed the depth map directly into the network? what is the range of the depth in your data? when those values go above 255, are they still treated as images by the network? and did you know how to feed them in as lmdb format?
@Yangqing Hi there, I'm trying to do pixel-wise regression, basically to learn depth from each rgb pixel, so for me the label is a depth image. I learnt that we can use two data layers, one for the data (rgb image), one for the label (depth map). If those two lmdbs are of the same size and batch size, they will be synchronized right? Now my question is for my depth map, since its value will go above the value limit of the grey image, should I still feed in the depth map as an image or do I need to do some pre-processing? Also when prepare the lmdbs for those two, should I just leave the label field of both as black? Thanks a lot!
Continued in #1019.
Hi all,
I have just modified caffe to run for dense pixel labelling for my semantic scene understanding task. I am testing that on a very simplistic network with only one convolution layer followed by ReLU and then softmax i.e.
data -> convolution ->ReLU -> softmax
The ground truth labelled image has 23 classes, therefore, I set the number of filters in convolution layer to only 23. The input is depth image of size 320x240 and the output of the network is 23 channel 320x240 images. The softmax loss layer then picks the appropriate label to compare that to the GT and computes the loss function which is summed over not just batch images but also on the number of pixels in the image. My training prototxt files looks like this:
==================== TRAIN PROTOTXT ============================== name: "VaFRIC_quick_train" layers { name: "VaFRIC" type: DATA top: "data" top: "label" data_param { source: "/home/workspace/code/AnnotationGT/" batch_size: 100 } } layers { name: "conv1" type: CONVOLUTION bottom: "data" top: "conv1" blobs_lr: 1 blobs_lr: 2 convolution_param { num_output: 23 pad: 2 kernel_size: 5 stride: 1 weight_filler { type: "gaussian" std: 0.0001 } bias_filler { type: "constant" } } } layers { name: "relu1" type: RELU bottom: "conv1" top: "conv1" } layers { name: "loss" type: SOFTMAX_LOSS bottom: "conv1" bottom: "label" }
However, I have been observing that the loss function only goes upwards and there is no decrease in the loss even when I change the learning rates and other variables that affect the SGDSolver. I modified the cifar10_solver.prototxt file to use the these new train and test files I created:
reduce the learning rate after 8 epochs (4000 iters) by a factor of 10
The training protocol buffer definition
train_net: "VaFRIC_quick_train.prototxt"
The testing protocol buffer definition
test_net: "VaFRIC_quick_test.prototxt"
test_iter specifies how many forward passes the test should carry out.
In the case of MNIST, we have test batch size 100 and 100 test iterations,
covering the full 10,000 testing images.
test_iter: 1
Carry out testing every 500 training iterations.
test_interval: 500
The base learning rate, momentum and the weight decay of the network.
base_lr: 0.01 momentum: 0.9 weight_decay: 0.004
The learning rate policy
lr_policy: "fixed" gamma: 0.0001 power: 0.75
Display every 100 iterations
display: 100
The maximum number of iterations
max_iter: 4000
snapshot intermediate results
snapshot: 4000 snapshot_prefix: "VaFRIC_quick"
solver mode: CPU or GPU
solver_mode: CPU
I have been using ~1200 images all labelled for training but I cannot quite figure out why the loss function is always increasing. My main questions are
(a) Is this too simple a network to converge? (b) Are there any other layers should I add to change the output? (c) Do I need more data? (d) How to verify that the changes I have made to obtain the dense output are indeed correct?
Being very new I'm unclear how to proceed from here. I'd be very grateful if I could get some reviews and feedback on this. Many thanks for your time and patience.
Kind Regards, Ankur.