alexgkendall / SegNet-Tutorial

Files for a tutorial to train SegNet for road scenes using the CamVid dataset
http://mi.eng.cam.ac.uk/projects/segnet/tutorial.html
851 stars 518 forks source link

Iwant to train 6 classes, so I change number of output. BUT I got error !!! Please help me ~ #67

Open Dasona opened 7 years ago

Dasona commented 7 years ago

hi~ I want to train six classes . So I changed the annotation (0~6) and changed only num_output :6 and ignore_label :6 .

This is my end of segnet_train.prototxt . I just change num_oupput and ignored label. layer { bottom: "conv1_2_D" top: "conv1_1_D" name: "conv1_1_D" type: "Convolution" param { lr_mult: 1 decay_mult: 1 } param { lr_mult: 2 decay_mult: 0 } convolution_param { weight_filler { type: "msra" } bias_filler { type: "constant" } _numoutput: 6 pad: 1 kernel_size: 3 } } layer { name: "loss" type: "SoftmaxWithLoss" bottom: "conv1_1_D" bottom: "label" top: "loss" softmax_param {engine: CAFFE} loss_param: { weight_by_label_freqs: true _ignore_label: 6 class_weighting: 0.9886
class_weighting: 0.6415
class_weighting: 16.0338
class_weighting: 0.4978 class_weighting: 1.0000
classweighting: 1.3441

} } layer { name: "accuracy" type: "Accuracy" bottom: "conv1_1_D" bottom: "label" top: "accuracy" top: "per_class_accuracy" }

I1026 13:19:37.835263 2388 solver.cpp:266] Learning Rate Policy: step * Error in `./caffe-segnet-multi-gpu/build/tools/caffe': malloc(): memory corruption (fast): 0x0000000008213fa0 * * Aborted at 1477455578 (unix time) try "date -d @1477455578" if you are using GNU date * PC: @ 0x7fdc1b426c37 (unknown) * SIGABRT (@0x3e800000954) received by PID 2388 (TID 0x7fdc1d584780) from PID 2388; stack trace: * @ 0x7fdc1b426cb0 (unknown) @ 0x7fdc1b426c37 (unknown) @ 0x7fdc1b42a028 (unknown) @ 0x7fdc1b4632a4 (unknown) @ 0x7fdc1b46dff7 (unknown) @ 0x7fdc1b470cf4 (unknown) @ 0x7fdc1b4726c0 (unknown) @ 0x7fdc1c059dad (unknown) @ 0x7fdc1ce006fd std::vector<>::_M_insert_aux() @ 0x7fdc1ce028ac caffe::AccuracyLayer<>::Forward_cpu() @ 0x7fdc1cd46a51 caffe::Net<>::ForwardFromTo() @ 0x7fdc1cd46dc7 caffe::Net<>::ForwardPrefilled() @ 0x7fdc1cd6bf19 caffe::Solver<>::Step() @ 0x7fdc1cd6c743 caffe::Solver<>::Solve() @ 0x408ebb train() @ 0x4069b1 main @ 0x7fdc1b411f45 (unknown) @ 0x40710c (unknown) @ 0x0 (unknown)

I got this error and I couldn't find where the error is. I used caffe-segnet-multi-gpu version, but I got same error when I used original caffe-segnet. Please help me to train 6 class.

And this is log of debugging mode. I1026 13:33:03.905791 9536 solver.cpp:265] Solving VGG_ILSVRC_16_layer I1026 13:33:03.905797 9536 solver.cpp:266] Learning Rate Policy: step I1026 13:33:03.972322 9539 dense_image_data_layer.cpp:234] Prefetch batch: 62 m s. I1026 13:33:03.972370 9539 dense_image_data_layer.cpp:235] Read time: 48.0 02 ms. I1026 13:33:03.972384 9539 dense_image_data_layer.cpp:236] Transform time: 14.1 73 ms. F1026 13:33:04.347821 9536 accuracy_layer.cpp:72] Check failed: label_value < n um_labels (6 vs. 6) * Check failure stack trace: * @ 0x7f6d79d30daa (unknown) @ 0x7f6d79d30ce4 (unknown) @ 0x7f6d79d306e6 (unknown) @ 0x7f6d79d33687 (unknown) @ 0x7f6d7a594bcd caffe::AccuracyLayer<>::Forward_cpu() @ 0x7f6d7a55a90d caffe::Layer<>::Forward_gpu() @ 0x41a686 caffe::Layer<>::Forward() @ 0x7f6d7a4be75d caffe::Net<>::ForwardFromTo() @ 0x7f6d7a4be525 caffe::Net<>::ForwardPrefilled() @ 0x7f6d7a4be8f0 caffe::Net<>::Forward() @ 0x7f6d7a4bf2d5 caffe::Net<>::ForwardBackward() @ 0x7f6d7a4eaa2b caffe::Solver<>::Step() @ 0x7f6d7a4ea4b7 caffe::Solver<>::Solve() @ 0x4154cb train() @ 0x4175fa main @ 0x7f6d78e32f45 (unknown) @ 0x414369 (unknown) @ (nil) (unknown) Aborted (core dumped)

beejisbrigit commented 7 years ago

I also get the same error periodically (sometimes - other times training would start as normal)! If you look at the stacktrace, there is (or might be) a problem with the Accuracy layer at the end of the model file. I commented this out and then this particular problem went away.

The strange thing is that I also get a CUBLAS error periodically as well:

F1104 13:25:18.790753 8281 math_functions.cu:123] Check failed: status == CUBLAS_STATUS_SUCCESS (11 vs. 0) CUBLAS_STATUS_MAPPING_ERROR * Check failure stack trace: * @ 0x7fe05da2fdaa (unknown) @ 0x7fe05da2fce4 (unknown) @ 0x7fe05da2f6e6 (unknown) @ 0x7fe05da32687 (unknown) @ 0x7fe05de93e7b caffe::caffe_gpu_asum<>() @ 0x7fe05de90e5f caffe::SoftmaxWithLossLayer<>::Backward_gpu() @ 0x7fe05dd4002c caffe::Net<>::BackwardFromTo() @ 0x7fe05dd40271 caffe::Net<>::Backward() @ 0x7fe05de49e5d caffe::Solver<>::Step() @ 0x7fe05de4a77f caffe::Solver<>::Solve() @ 0x4086c8 train() @ 0x406c61 main @ 0x7fe05cf41ec5 (unknown) @ 0x40720d (unknown) @ (nil) (unknown)

Other times I am able to start training without any problems.

I am training with 9 classes on an Ubuntu 14.04 machine with a Titan X GPU.

AmlanAlok commented 7 years ago

@beejisbrigit Hi

I am facing the same error. Can you please help me by telling what exactly you did to clear that error?

ArunJ1 commented 5 years ago

I am also facing the same issue @alexgkendall