chuckcho / video-caffe

Video-friendly caffe -- comes with the most recent version of Caffe (as of Jan 2019), a video reader, 3D(ND) pooling layer, and an example training script for C3D network and UCF-101 data
Other
174 stars 93 forks source link

Issue of training C3D network from the pre-trained model #21

Closed GuangmingZhu closed 7 years ago

GuangmingZhu commented 8 years ago

When I am trying to finetune the network of C3D from the pre-trained model conv3d_deepnetA_sport1m_iter_1900000 or c3d_ucf101_finetune_whole_iter_20000, the loss will always be 87.3365, as below:

I0623 16:00:16.808712  1945 solver.cpp:280] Learning Rate Policy: step
I0623 16:00:16.810631  1945 solver.cpp:337] Iteration 0, Testing net (#0)
I0623 16:00:17.330435  1945 blocking_queue.cpp:50] Data layer prefetch queue empty
I0623 16:01:42.667268  1945 solver.cpp:404]     Test net output #0: accuracy/top-1 = 0.0787535
I0623 16:01:42.667341  1945 solver.cpp:404]     Test net output #1: loss = 54.616 (* 1 = 54.616 loss)
I0623 16:01:43.675875  1945 solver.cpp:228] Iteration 0, loss = 58.5096
I0623 16:01:43.675940  1945 solver.cpp:244]     Train net output #0: loss = 58.5096 (* 1 = 58.5096 loss)
I0623 16:01:43.675977  1945 sgd_solver.cpp:106] Iteration 0, lr = 0.0001
I0623 16:03:03.220101  1945 solver.cpp:228] Iteration 20, loss = 87.3365
I0623 16:03:03.220196  1945 solver.cpp:244]     Train net output #0: loss = 87.3365 (* 1 = 87.3365 loss)
I0623 16:03:03.220211  1945 sgd_solver.cpp:106] Iteration 20, lr = 0.0001

But if I train from scratch, it will be fine. I even changed BlobProto as below,

message BlobProto { optional BlobShape shape = 8; repeated float data = 6 [packed = true]; repeated float diff = 7 [packed = true]; repeated double double_data = 9 [packed = true]; repeated double double_diff = 10 [packed = true]; optional int32 num = 1 [default = 0]; optional int32 channels = 2 [default = 0]; optional int32 length = 3 [default = 0]; optional int32 height = 4 [default = 0]; optional int32 width = 5 [default = 0]; }

but it did not work. Can you help me to solve this problem? It will be very helpful if I can train from the pre-trained models for my own applications.

----- 1st group ----- layer { name: "conv1a" type: "NdConvolution" bottom: "data" top: "conv1a" param { lr_mult: 1 decay_mult: 1 } param { lr_mult: 2 decay_mult: 0 } convolution_param { num_output: 64 kernel_shape { dim: 3 dim: 3 dim: 3 } stride_shape { dim: 1 dim: 1 dim: 1 } pad_shape { dim: 1 dim: 1 dim: 1 } weight_filler { type: "gaussian" std: 0.01 } bias_filler { type: "constant" value: 0 } } } layer { name: "relu1a" type: "ReLU" bottom: "conv1a" top: "conv1a" } layer { name: "pool1" type: "NdPooling" bottom: "conv1a" top: "pool1" pooling_param { pool: MAX kernel_shape { dim: 1 dim: 2 dim: 2 } stride_shape { dim: 1 dim: 2 dim: 2 } } }

----- 2nd group ----- layer { name: "conv2a" type: "NdConvolution" bottom: "pool1" top: "conv2a" param { lr_mult: 1 decay_mult: 1 } param { lr_mult: 2 decay_mult: 0 } convolution_param { num_output: 128 kernel_shape { dim: 3 dim: 3 dim: 3 } stride_shape { dim: 1 dim: 1 dim: 1 } pad_shape { dim: 1 dim: 1 dim: 1 } weight_filler { type: "gaussian" std: 0.01 } bias_filler { type: "constant" value: 1 } } } layer { name: "relu2a" type: "ReLU" bottom: "conv2a" top: "conv2a" } layer { name: "pool2" type: "NdPooling" bottom: "conv2a" top: "pool2" pooling_param { pool: MAX kernel_shape { dim: 2 dim: 2 dim: 2 } stride_shape { dim: 2 dim: 2 dim: 2 } } }

----- 3rd group ----- layer { name: "conv3a" type: "NdConvolution" bottom: "pool2" top: "conv3a" param { lr_mult: 1 decay_mult: 1 } param { lr_mult: 2 decay_mult: 0 } convolution_param { num_output: 256 kernel_shape { dim: 3 dim: 3 dim: 3 } stride_shape { dim: 1 dim: 1 dim: 1 } pad_shape { dim: 1 dim: 1 dim: 1 } weight_filler { type: "gaussian" std: 0.01 } bias_filler { type: "constant" value: 1 } } } layer { name: "relu3a" type: "ReLU" bottom: "conv3a" top: "conv3a" } layer { name: "conv3b" type: "NdConvolution" bottom: "conv3a" top: "conv3b" param { lr_mult: 1 decay_mult: 1 } param { lr_mult: 2 decay_mult: 0 } convolution_param { num_output: 256 kernel_shape { dim: 3 dim: 3 dim: 3 } stride_shape { dim: 1 dim: 1 dim: 1 } pad_shape { dim: 1 dim: 1 dim: 1 } weight_filler { type: "gaussian" std: 0.01 } bias_filler { type: "constant" value: 1 } } } layer { name: "relu3b" type: "ReLU" bottom: "conv3b" top: "conv3b" } layer { name: "pool3" type: "NdPooling" bottom: "conv3b" top: "pool3" pooling_param { pool: MAX kernel_shape { dim: 2 dim: 2 dim: 2 } stride_shape { dim: 2 dim: 2 dim: 2 } } }

----- 4th group ----- layer { name: "conv4a" type: "NdConvolution" bottom: "pool3" top: "conv4a" param { lr_mult: 1 decay_mult: 1 } param { lr_mult: 2 decay_mult: 0 } convolution_param { num_output: 512 kernel_shape { dim: 3 dim: 3 dim: 3 } stride_shape { dim: 1 dim: 1 dim: 1 } pad_shape { dim: 1 dim: 1 dim: 1 } weight_filler { type: "gaussian" std: 0.01 } bias_filler { type: "constant" value: 1 } } } layer { name: "relu4a" type: "ReLU" bottom: "conv4a" top: "conv4a" } layer { name: "conv4b" type: "NdConvolution" bottom: "conv4a" top: "conv4b" param { lr_mult: 1 decay_mult: 1 } param { lr_mult: 2 decay_mult: 0 } convolution_param { num_output: 512 kernel_shape { dim: 3 dim: 3 dim: 3 } stride_shape { dim: 1 dim: 1 dim: 1 } pad_shape { dim: 1 dim: 1 dim: 1 } weight_filler { type: "gaussian" std: 0.01 } bias_filler { type: "constant" value: 1 } } } layer { name: "relu4b" type: "ReLU" bottom: "conv4b" top: "conv4b" } layer { name: "pool4" type: "NdPooling" bottom: "conv4b" top: "pool4" pooling_param { pool: MAX kernel_shape { dim: 2 dim: 2 dim: 2 } stride_shape { dim: 2 dim: 2 dim: 2 } } }

----- 5th group ----- layer { name: "conv5a" type: "NdConvolution" bottom: "pool4" top: "conv5a" param { lr_mult: 1 decay_mult: 1 } param { lr_mult: 2 decay_mult: 0 } convolution_param { num_output: 512 kernel_shape { dim: 3 dim: 3 dim: 3 } stride_shape { dim: 1 dim: 1 dim: 1 } pad_shape { dim: 1 dim: 1 dim: 1 } weight_filler { type: "gaussian" std: 0.01 } bias_filler { type: "constant" value: 1 } } } layer { name: "relu5a" type: "ReLU" bottom: "conv5a" top: "conv5a" } layer { name: "conv5b" type: "NdConvolution" bottom: "conv5a" top: "conv5b" param { lr_mult: 1 decay_mult: 1 } param { lr_mult: 2 decay_mult: 0 } convolution_param { num_output: 512 kernel_shape { dim: 3 dim: 3 dim: 3 } stride_shape { dim: 1 dim: 1 dim: 1 } pad_shape { dim: 1 dim: 1 dim: 1 } weight_filler { type: "gaussian" std: 0.01 } bias_filler { type: "constant" value: 1 } } } layer { name: "relu5b" type: "ReLU" bottom: "conv5b" top: "conv5b" } layer { name: "pool5" type: "NdPooling" bottom: "conv5b" top: "pool5" pooling_param { pool: MAX kernel_shape { dim: 2 dim: 2 dim: 2 } stride_shape { dim: 2 dim: 2 dim: 2 } } }

----- 1st fc group ----- layer { name: "fc6" type: "InnerProduct" bottom: "pool5" top: "fc6" param { lr_mult: 1 decay_mult: 1 } param { lr_mult: 2 decay_mult: 0 } inner_product_param { num_output: 4096 weight_filler { type: "gaussian" std: 0.005 } bias_filler { type: "constant" value: 1 } } } layer { name: "relu6" type: "ReLU" bottom: "fc6" top: "fc6" } layer { name: "drop6" type: "Dropout" bottom: "fc6" top: "fc6" dropout_param { dropout_ratio: 0.5 } }

----- 2nd fc group ----- layer { name: "fc7" type: "InnerProduct" bottom: "fc6" top: "fc7" param { lr_mult: 1 decay_mult: 1 } param { lr_mult: 2 decay_mult: 0 } inner_product_param { num_output: 4096 weight_filler { type: "gaussian" std: 0.005 } bias_filler { type: "constant" value: 1 } } } layer { name: "relu7" type: "ReLU" bottom: "fc7" top: "fc7" } layer { name: "drop7" type: "Dropout" bottom: "fc7" top: "fc7" dropout_param { dropout_ratio: 0.5 } }

----- 3rd fc group ----- layer { name: "fc8-msr" type: "InnerProduct" bottom: "fc7" top: "fc8" param { lr_mult: 10 decay_mult: 1 } param { lr_mult: 20 decay_mult: 0 } inner_product_param { num_output: 101 weight_filler { type: "gaussian" std: 0.01 } bias_filler { type: "constant" value: 0 } } } layer { name: "prob" type: "Softmax" bottom: "fc8" top: "prob" include { phase: TEST } } layer { name: "accuracy" type: "Accuracy" bottom: "prob" bottom: "label" top: "accuracy/top-1" include { phase: TEST } } layer { name: "loss" type: "SoftmaxWithLoss" bottom: "fc8" bottom: "label" top: "loss" }

chuckcho commented 8 years ago

Where did you get these models? -- conv3d_deepnetA_sport1m_iter_1900000 or c3d_ucf101_finetune_whole_iter_20000. I transplanted facebook/C3D-format pretrained model (on Sports1m) into a video-caffe compatible model, and made it available at this link with the accompanying model architecture file. Hope this helps.

mzolfaghari commented 8 years ago

@chuckcho Thanks for providing the pretrained model. I got my own pretrained model (by converting C3D model to new caffe model) and finetuned it on UCF101 dataset but network did not converge!! Did you use this model to finetune on UCF101 and see what's the performance in term of accuracy? If yes, could you share your result.

Best,

GuangmingZhu commented 8 years ago

@chuckcho Thanks, the used pre-trained models are facebook/C3D-format models downloaded from the C3D Project Website. I tried to finetune on UCF-101 from thevideo-caffe compatible model you shared, but I still got the problem as below. If I train the model from scratch, everything is fine except the low accuracy. So can you tell what's the problem with my finetuning?

I0625 16:21:54.600855 24931 net.cpp:752] Ignoring source layer fc8-dextro-1845 I0625 16:21:54.600858 24931 net.cpp:752] Ignoring source layer loss I0625 16:21:54.602797 24931 caffe.cpp:219] Starting Optimization I0625 16:21:54.602813 24931 solver.cpp:279] Solving c3d_ucf101 I0625 16:21:54.602816 24931 solver.cpp:280] Learning Rate Policy: step I0625 16:21:54.604764 24931 solver.cpp:337] Iteration 0, Testing net (#0) I0625 16:21:55.132699 24931 blocking_queue.cpp:50] Data layer prefetch queue empty I0625 16:22:39.113358 24931 solver.cpp:404] Test net output #0: accuracy/top-1 = 0.017 I0625 16:22:39.113430 24931 solver.cpp:404] Test net output #1: loss = 6.82415 (* 1 = 6.82415 loss) I0625 16:22:42.453061 24931 solver.cpp:228] Iteration 0, loss = 11.0795 I0625 16:22:42.453121 24931 solver.cpp:244] Train net output #0: loss = 11.0795 (* 1 = 11.0795 loss) I0625 16:22:42.453156 24931 sgd_solver.cpp:106] Iteration 0, lr = 0.0001 I0625 16:23:55.820638 24931 solver.cpp:228] Iteration 20, loss = 87.3365 I0625 16:23:55.820755 24931 solver.cpp:244] Train net output #0: loss = 87.3365 (* 1 = 87.3365 loss) I0625 16:23:55.820765 24931 sgd_solver.cpp:106] Iteration 20, lr = 0.0001 I0625 16:25:09.443394 24931 solver.cpp:228] Iteration 40, loss = 87.3365 I0625 16:25:09.443495 24931 solver.cpp:244] Train net output #0: loss = 87.3365 (* 1 = 87.3365 loss) I0625 16:25:09.443506 24931 sgd_solver.cpp:106] Iteration 40, lr = 0.0001 I0625 16:26:22.738557 24931 solver.cpp:228] Iteration 60, loss = 87.3365 I0625 16:26:22.738657 24931 solver.cpp:244] Train net output #0: loss = 87.3365 (* 1 = 87.3365 loss) I0625 16:26:22.738668 24931 sgd_solver.cpp:106] Iteration 60, lr = 0.0001 I0625 16:27:35.669477 24931 solver.cpp:228] Iteration 80, loss = 87.3365 I0625 16:27:35.669544 24931 solver.cpp:244] Train net output #0: loss = 87.3365 (* 1 = 87.3365 loss) I0625 16:27:35.669555 24931 sgd_solver.cpp:106] Iteration 80, lr = 0.0001 I0625 16:28:48.383669 24931 solver.cpp:228] Iteration 100, loss = 87.3365 I0625 16:28:48.383783 24931 solver.cpp:244] Train net output #0: loss = 87.3365 (* 1 = 87.3365 loss) I0625 16:28:48.383795 24931 sgd_solver.cpp:106] Iteration 100, lr = 0.0001

jiaxue-ai commented 8 years ago

@GuangmingZhu I think the caffe version between C3D and video-caffe is different, so you are not able to use facebook/C3D-format models downloaded from the C3D Project Website directly

GuangmingZhu commented 8 years ago

@mrxue1993 Thank you! I took several days to try to convert facebook/C3D-format models into video-caffe-format models, I found the convolutional algorithm used in video-caffe is different from that in C3D for the layers conv4a~conv5b, so we can't use the parameters of those layers directly. CuDNN will choose the conv algorithm automatically according to the input and output in the code of video-caffe, I tried to set the conv algorithm manually, but it did not work.

LuisTbx commented 8 years ago

@GuangmingZhu Did you find a way to solve this issue? I am falling into the same case, after few hundred iterations the loss explodes to the same value 87.3365. Any idea how to solve the issue?

GuangmingZhu commented 8 years ago

@LuisTbx No. I am not trying to solve this issue any more.

junmuzi commented 7 years ago

solved by decreasing base_lr: 0.0001 from base_lr: 0.003. In a word, when hit loss = 87.3365: 1, decrease your base_lr; 2, check whether num of class label begin from 0, 1, ...(At the same time, make sure the num_output value is the same with the num of labels.)

alidiba67 commented 7 years ago

@junmuzi Could you please share your prtototxt file for finetuning UCF101 from the pretrained model? I ran the experiments with same base_lr as you mentioned but it doesn't work and loss is stuck at 87.

vivoutlaw commented 7 years ago

@junmuzi I tried your suggestion, and I don't see any change. The loss remains stucked to 87 even after setting base_lr: 0.0001. Do you mind sharing your prototxt file? @chuckcho: didn't you have the same issue? I would really appreciate if yo could have a look at it. Thanks! :)

chuckcho commented 7 years ago

@vivoutlaw didn't have loss blowing up. can you keep reducing base_lr until training loss looks reasonable?

vivoutlaw commented 7 years ago

@chuckcho I did reduce the base learning rate but it happens again. Could you please provide the script and also the prototxt file to fine-tune the pre-trained model on UCF101? Thanks! :)

LuisTbx commented 7 years ago

Problem solved, here my recommendations:

1) Scale your data, adding to prototxt scale parameter 1/255; 2) Pre-Shuffle your data, since shuffle=on only does a per batch shuffling. 3) Be sure your labels start with zero and that there are no missing values (i.e like jumping from label 1 to 4) 4) Add to your solver test-on-test and test-on-train parameters. 5) Use all the information, at this time, video reader only takes the 16 frames after the starting frame parameter. Thus, if your video is longer, you should write in your .txt files the triplet: video name, start, class.

Hope it helps.

pelun commented 7 years ago

When I run train_ucf101.sh. Show error: File "train_ucf101.sh", line 3 ./build/tools/caffe \ ^ SyntaxError: invalid syntax Everyone can help me? Thanks.

chuckcho commented 7 years ago

shell-related issue. closing.

haohao900618 commented 7 years ago

@chuckcho Hi, I used the pre-trained model "conv3d_deepnetA_sport1m_iter_1900000.model" which is download from the link you reported above. I fine-tuned the model on UCF-101, but it doesn't converge or even diverges with loss=87.3365. Have you used the pre-trained model? I am not sure whether the pre-trained models works? Anyone else has encountered the same issue with me?

chuckcho commented 7 years ago

@haohao900618 Per updated README (https://github.com/chuckcho/video-caffe#pretrained-model), can you download the pretrained model from there, and try?

chuckcho commented 7 years ago

@haohao900618 This was the original thread: https://github.com/chuckcho/video-caffe/issues/46#issuecomment-251170719

haohao900618 commented 7 years ago

@chuckcho I have tried the pre-trained model "c3d_ucf101_iter_38000.caffemodel", But the model is sub-optimal and cannot satisfy our demand. Do you have the caffemodel trained on Sports-1M with 8 3D convolutional layers (C3D model). Have you tried the pre-trained model and whether is the model compatible for video-caffe? Thanks for you!

haohao900618 commented 7 years ago

@LuisTbx @GuangmingZhu @vivoutlaw @alidiba67 did you solve the problems. Could please share your caffemodel pre-trained on Sports-1M. Thanks for you!

LuisTbx commented 7 years ago

@haohao900618 i did not finetune on Sports-1M, i used my own data, thus, i cannot provide the model. Moreover, i can confirm the training should work. Where are you having issues? Best,

GuangmingZhu commented 7 years ago

@haohao900618 As I replied on 16 July, it is the final result I got.

haohao900618 commented 7 years ago

@LuisTbx thanks for your attention. I had reported my problems on the last two comment in this page.

Hi, I used the pre-trained model "conv3d_deepnetA_sport1m_iter_1900000.model" which is download from the link you reported above. I fine-tuned the model on UCF-101, but it doesn't converge or even diverges with loss=87.3365. Have you used the pre-trained model? I am not sure whether the pre-trained models works? Anyone else has encountered the same issue with me?

I have tried the pre-trained model "c3d_ucf101_iter_38000.caffemodel", But the model is sub-optimal and cannot satisfy our demand. Do you have the caffemodel trained on Sports-1M with 8 3D convolutional layers (C3D model). Have you tried the pre-trained model and whether is the model compatible for video-caffe? Thanks for you!

LuisTbx commented 7 years ago

Hi, @haohao900618 i see now, i faced once the same issues then solved it.

What i could suggest is the following: make sure your data is scaled and zero centred (it makes it easier to optimize), pay attention to your solver's learning rate, step and momentum. Be sure your data is correctly shuffled before loading (yes, before the online shuffle on the caffe video_data layer, because that only works at batch level, and if in the same batch there is only one label type, then you will face problems)

If none of that does work, i would recommend you to train your own model on Sports1M adding some BN layers to speed up the process and be less sensitive to learning rate changes. Then once you get the model, fine-tune in your own data.

All the best.

haohao900618 commented 7 years ago

@LuisTbx Thanks for your suggestions. I will try to trian my model on Sports1M.

bitwangdan commented 7 years ago

@haohao900618 Hi ,do you have a pre-trained model on sport1M dataset , the model on the ucf101 can't satisfy my demand ,so can you provide a model trained on sport1M dataset?

zhuolinumd commented 7 years ago

has anyone solved this issue? I did rescale and shuffle the data as suggested by @LuisTbx. The loss is still around 4.7075 at iteration 3000. It seems that the training process does not converge. Any suggestions?

haohao900618 commented 7 years ago

@jiang2764 which caffemodel you have used for finetuning?

zhuolinumd commented 7 years ago

I am using the model provided by @chuckcho (https://dl.dropboxusercontent.com/u/306922/video-caffe-model/conv3d_deepnetA_sport1m_iter_1900000.model). Have you successfully finetuned it on UCF101?

zhuolinumd commented 7 years ago

@haohao900618 I am using the model provided by @chuckcho (https://dl.dropboxusercontent.com/u/306922/video-caffe-model/conv3d_deepnetA_sport1m_iter_1900000.model). Have you successfully finetuned it on UCF101?