ZheC / Realtime_Multi-Person_Pose_Estimation

Code repo for realtime multi-person pose estimation in CVPR'17 (Oral)
Other
5.1k stars 1.37k forks source link

Issue on training (bash train_pose.sh) #212

Open changkk opened 5 years ago

changkk commented 5 years ago

Hi, I am following the training instruction exactly. I installed caffe_train and downloaded 189GB LMDB file and launched train_pose.sh which came from set_layer.py. However I got this error.


[libprotobuf WARNING google/protobuf/io/coded_stream.cc:537] Reading dangerously large protocol message. If the message turns out to be larger than 2147483647 bytes, parsing will be halted for security reasons. To increase the limit (or to disable these warnings), see CodedInputStream::SetTotalBytesLimit() in google/protobuf/io/coded_stream.h. [libprotobuf WARNING google/protobuf/io/coded_stream.cc:78] The total number of bytes read was 574671192 I1230 22:03:59.697592 6184 upgrade_proto.cpp:52] Attempting to upgrade input file specified using deprecated V1LayerParameter: /home/changkoo/Realtime_Multi-Person_Pose_Estimation_ck/training/dataset/COCO/train/VGG_ILSVRC_19_layers.caffemodel I1230 22:04:00.144183 6184 upgrade_proto.cpp:60] Successfully upgraded file specified using deprecated V1LayerParameter I1230 22:04:00.156769 6184 upgrade_proto.cpp:66] Attempting to upgrade input file specified using deprecated input fields: /home/changkoo/Realtime_Multi-Person_Pose_Estimation_ck/training/dataset/COCO/train/VGG_ILSVRC_19_layers.caffemodel I1230 22:04:00.156787 6184 upgrade_proto.cpp:69] Successfully upgraded file specified using deprecated input fields. W1230 22:04:00.156791 6184 upgrade_proto.cpp:71] Note that future Caffe releases will only support input layers and not input fields. I1230 22:04:00.156970 6184 net.cpp:761] Ignoring source layer pool1 I1230 22:04:00.157120 6184 net.cpp:761] Ignoring source layer pool2 I1230 22:04:00.158727 6184 net.cpp:761] Ignoring source layer pool3 I1230 22:04:00.161192 6184 net.cpp:761] Ignoring source layer conv4_3 I1230 22:04:00.161202 6184 net.cpp:761] Ignoring source layer relu4_3 I1230 22:04:00.161206 6184 net.cpp:761] Ignoring source layer conv4_4 I1230 22:04:00.161207 6184 net.cpp:761] Ignoring source layer relu4_4 I1230 22:04:00.161208 6184 net.cpp:761] Ignoring source layer pool4 I1230 22:04:00.161211 6184 net.cpp:761] Ignoring source layer conv5_1 I1230 22:04:00.161214 6184 net.cpp:761] Ignoring source layer relu5_1 I1230 22:04:00.161216 6184 net.cpp:761] Ignoring source layer conv5_2 I1230 22:04:00.161219 6184 net.cpp:761] Ignoring source layer relu5_2 I1230 22:04:00.161222 6184 net.cpp:761] Ignoring source layer conv5_3 I1230 22:04:00.161226 6184 net.cpp:761] Ignoring source layer relu5_3 I1230 22:04:00.161227 6184 net.cpp:761] Ignoring source layer conv5_4 I1230 22:04:00.161231 6184 net.cpp:761] Ignoring source layer relu5_4 I1230 22:04:00.161233 6184 net.cpp:761] Ignoring source layer pool5 I1230 22:04:00.161237 6184 net.cpp:761] Ignoring source layer fc6 I1230 22:04:00.161238 6184 net.cpp:761] Ignoring source layer relu6 I1230 22:04:00.161240 6184 net.cpp:761] Ignoring source layer drop6 I1230 22:04:00.161244 6184 net.cpp:761] Ignoring source layer fc7 I1230 22:04:00.161247 6184 net.cpp:761] Ignoring source layer relu7 I1230 22:04:00.161249 6184 net.cpp:761] Ignoring source layer drop7 I1230 22:04:00.161252 6184 net.cpp:761] Ignoring source layer fc8 I1230 22:04:00.161254 6184 net.cpp:761] Ignoring source layer prob I1230 22:04:00.198676 6184 caffe.cpp:251] Starting Optimization I1230 22:04:00.198695 6184 solver.cpp:279] Solving I1230 22:04:00.198698 6184 solver.cpp:280] Learning Rate Policy: step 1adfadsf 0xaf9f4e0first 0xaf9ca60second 0x20second 0x21 1adfadsf 0xafa3d40first 0xaf9cc60second 0x20second 0x21 1adfadsf 0x1aa62190first 0xaf9f660second 0x20second 0x21 F1230 22:04:00.317441 6184 eltwise_layer.cpp:35] Check failed: bottom[i]->shape() == bottom[0]->shape() Check failure stack trace: @ 0x7f79d032b5cd google::LogMessage::Fail() @ 0x7f79d032d433 google::LogMessage::SendToLog() @ 0x7f79d032b15b google::LogMessage::Flush() @ 0x7f79d032de1e google::LogMessageFatal::~LogMessageFatal() @ 0x7f79d075d404 caffe::EltwiseLayer<>::Reshape() @ 0x7f79d082e708 caffe::Net<>::ForwardFromTo() @ 0x7f79d082eab7 caffe::Net<>::Forward() @ 0x7f79d0852690 caffe::Solver<>::Step() @ 0x7f79d08532d9 caffe::Solver<>::Solve() @ 0x40cccf train() @ 0x4086c0 main @ 0x7f79cf22d830 __libc_start_main @ 0x408ed9 _start @ (nil) (unknown)

Seems like the error comes from eltwise_layer.cpp, so I dissected this file, and it seems the error is generated because of the different sizes of two bottom layers in one of the eltwise layer. I am not sure why this is happening even though I haven't changed anything in the protxt files and set_layer.py other than the source path. Isn't there anyone who got this error using the repo?

I tried to unbold the error, but I don't know how to do.... Sorry for that!

Thanks!

changkk commented 5 years ago

I just found similar issue in the closed issues but this issue can be related to the batch size??? Should I use the batch size more than 1?