alexgkendall / caffe-segnet

Implementation of SegNet: A Deep Convolutional Encoder-Decoder Architecture for Semantic Pixel-Wise Labelling
http://mi.eng.cam.ac.uk/projects/segnet/
Other
1.08k stars 452 forks source link

changing the number of classes in segnet produce error: cudaSuccess (700 vs. 0) an illegal memory access was encountered. #152

Closed slimway closed 4 years ago

slimway commented 4 years ago

After installing caffe-segnet and rectified paths as in the toturial segnet. i was able to train segnet basic on the camvid data set without any problem. Than i tried to apply segnet on my data set which contains only three classes ( soil (background):0, wanted plants: 1, and unwanted plants:2). so i did this:

1) I have transformed the code colors in the ground truth and labeled each pixels as either 0, 1 or 2. 2) changed num of outputs to 3 and kept only three class_wieghts. the ignore label is: 3. 3) changed the batch size to 1 to match my RTX2060 memory size (6 G).

1st case: when i run the model with my own computed class_weights as bellow (last layers):

.... layer { name: "conv_classifier" type: "Convolution" bottom: "conv_decode1" top: "conv_classifier" param { lr_mult: 1 decay_mult: 1 } param { lr_mult: 2 decay_mult: 0 } convolution_param { num_output: 3 kernel_size: 1 weight_filler { type: "msra" } bias_filler { type: "constant" } } } layer { name: "loss" type: "SoftmaxWithLoss" bottom: "conv_classifier" bottom: "label" top: "loss" softmax_param {engine: CAFFE} loss_param: { weight_by_label_freqs: true class_weighting: 0.3453 class_weighting: 16.273 class_weighting: 23.35 } } layer { name: "accuracy" type: "Accuracy" bottom: "conv_classifier" bottom: "label" top: "accuracy" top: "per_class_accuracy" }

it give me an error like this : I0614 16:26:17.828141 7780 solver.cpp:251] Learning Rate Policy: step I0614 16:26:18.048254 7780 solver.cpp:214] Iteration 0, loss = 0.549489 I0614 16:26:18.048275 7780 solver.cpp:229] Train net output #0: accuracy = 0.340816 I0614 16:26:18.048282 7780 solver.cpp:229] Train net output #1: loss = 0.549489 (* 1 = 0.549489 loss) I0614 16:26:18.048286 7780 solver.cpp:229] Train net output #2: per_class_accuracy = 0.340816 I0614 16:26:18.048290 7780 solver.cpp:229] Train net output #3: per_class_accuracy = 1 I0614 16:26:18.048305 7780 solver.cpp:229] Train net output #4: per_class_accuracy = 1 I0614 16:26:18.048336 7780 solver.cpp:486] Iteration 0, lr = 0.1 F0614 16:26:18.559595 7780 math_functions.cpp:91] Check failed: error == cudaSuccess (700 vs. 0) an illegal memory access was encountered Check failure stack trace: @ 0x7fae4f606e3d google::LogMessage::Fail() @ 0x7fae4f608bc0 google::LogMessage::SendToLog() @ 0x7fae4f606a23 google::LogMessage::Flush() @ 0x7fae4f60958e google::LogMessageFatal::~LogMessageFatal() @ 0x7fae4fa42892 caffe::caffe_copy<>() @ 0x7fae4fa86bb7 caffe::BasePrefetchingDataLayer<>::Forward_gpu() @ 0x7fae4fa60952 caffe::Net<>::ForwardFromTo() @ 0x7fae4fa60b47 caffe::Net<>::ForwardPrefilled() @ 0x7fae4fa1b495 caffe::Solver<>::Step() @ 0x7fae4fa1bf54 caffe::Solver<>::Solve() @ 0x4094e9 train() @ 0x406e68 main @ 0x7fae4ecb9830 (unknown) @ 0x407409 _start Aborted (core dumped)

2nd case: The weired part is that when i change the class_weight of the first labels to 0 as shown below, the model train without error, but in the test process is doesn't predict the background :0 at all. it only predict class 1 and 2. .... layer { name: "conv_classifier" type: "Convolution" bottom: "conv_decode1" top: "conv_classifier" param { lr_mult: 1 decay_mult: 1 } param { lr_mult: 2 decay_mult: 0 } convolution_param { num_output: 3 kernel_size: 1 weight_filler { type: "msra" } bias_filler { type: "constant" } } } layer { name: "loss" type: "SoftmaxWithLoss" bottom: "conv_classifier" bottom: "label" top: "loss" softmax_param {engine: CAFFE} loss_param: { weight_by_label_freqs: true class_weighting: 0 class_weighting: 16.273 class_weighting: 23.35 } } layer { name: "accuracy" type: "Accuracy" bottom: "conv_classifier" bottom: "label" top: "accuracy" top: "per_class_accuracy" }

Question: what am i missing ?? why it only work when the background class_weight is zero ???

slimway commented 4 years ago

Update: how did i solve it. For anyone facing this problem, the reason why it gives me that weird error when i give the label 0 a weights was because in the original images (RGB images) i had some regions with pixels equal zero. so what i did is that i considered those pixels as unlabeled places (because already it contains no object in the original images) and kept the label zero to refer to those regions in the ground truth. Thus, giving a zero weight now to those regions sounds good because it doesn't contain any object already. now the output is 4 classes [0: unlabled pixels, 1: class1, 2: class2, 3: class3].

best regards