VisionLearningGroup / R-C3D

code for R-C3D
MIT License
254 stars 94 forks source link

about some details #20

Open dlyldxwl opened 6 years ago

dlyldxwl commented 6 years ago

thanks for your job! now i want to finetune the 135000 caffemodel by using myself dataest,and because of network ,i can't download ActivityNet dataest videos,so i don't know dataest format i really hope you can help me.thx again!

huijuan88 commented 6 years ago

Maybe, you can see the other two examples on THUMOS14 and Charades dataset. Their formats are quite similar.

The videos in these two datasets can be downloaded.

On Apr 11, 2018, at 21:39, Chen yh notifications@github.com<mailto:notifications@github.com> wrote:

thanks for your job! now i want to finetune the 135000 caffemodel by using myself dataest,and because of network ,i can't download ActivityNet dataest videos,so i don't know dataest format i really hope you can help me.thx again!

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHubhttps://github.com/VisionLearningGroup/R-C3D/issues/20, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AFOa_-taNzpW8LLCl5apQ6E-yyXaQWtMks5tnrBHgaJpZM4TRAIy.

dlyldxwl commented 6 years ago

thank you for you reply. I fiuntune 135000 caffemodel, now the network can run. but I don't understand the information of Terminal output.

Accuracy: 0.984375 I0413 09:36:22.818658 4436 accuracy_layer.cpp:101] Class 0 accuracy : 1 I0413 09:36:22.818661 4436 accuracy_layer.cpp:101] Class 1 accuracy : 0.666667 TRAIN I0413 09:36:22.826498 4436 solver.cpp:228] Iteration 63, loss = 0.973929 I0413 09:36:22.826509 4436 solver.cpp:244] Train net output #0: accuarcy = 0.714286 I0413 09:36:22.826514 4436 solver.cpp:244] Train net output #1: loss_cls = 0.531705 ( 1 = 0.531705 loss) I0413 09:36:22.826519 4436 solver.cpp:244] Train net output #2: loss_twin = 0.38329 ( 1 = 0.38329 loss) I0413 09:36:22.826520 4436 solver.cpp:244] Train net output #3: rpn_accuarcy = 0.984375 I0413 09:36:22.826522 4436 solver.cpp:244] Train net output #4: rpn_accuarcy_class = 1 I0413 09:36:22.826524 4436 solver.cpp:244] Train net output #5: rpn_accuarcy_class = 0.666667 I0413 09:36:22.826527 4436 solver.cpp:244] Train net output #6: rpn_cls_loss = 0.0271421 ( 1 = 0.0271421 loss) I0413 09:36:22.826530 4436 solver.cpp:244] Train net output #7: rpn_loss_twin = 0.0317921 ( 1 = 0.0317921 loss)

  1. I believe the Accuray is rpn_accuarcy, and does the class0 ,1 accuracy indicate that RPN has the accuracy foreground and background classification ?and Train net output #0: accuarcy = 0.714286 indicate the accuarcy of the R-C3D network? 2.the learing rate of conv1a,conv2a.. are 0, Do you think these layers needs backward computation when finetuned? 3.I don't fine the demo.py file,which means I should write the py file when I want to use a video to detect the effect of caffemodel?

I hope you can answer my questions. Wish you a happy day, thanks again!

dlyldxwl commented 6 years ago

I want to use caffemodel to detect a video, but I can't find the demo file ,if the folder has the file ,can you tell you the file postion? if don't have the file, which means i need write it. In test_net.py, i find the input format of network is .pkl, but i want to use a video directtly, what should i do?

huijuan88 commented 6 years ago

"Class 0 accuracy” indicates the background classification accuracy in one batch. "Class 1 accuracy” is for foreground classification accuracy in one batch.

Opening the lower layers’ learning rate has minor effect, but the training speed is slow down.

The demo file can be written following the test prediction code.

On Apr 12, 2018, at 22:59, Chen yh notifications@github.com<mailto:notifications@github.com> wrote:

thank you for you reply. I fiuntune 135000 caffemodel, now the network can run. but I don't understand the information of Terminal output.

Accuracy: 0.984375 I0413 09:36:22.818658 4436 accuracy_layer.cpp:101] Class 0 accuracy : 1 I0413 09:36:22.818661 4436 accuracy_layer.cpp:101] Class 1 accuracy : 0.666667 TRAIN I0413 09:36:22.826498 4436 solver.cpp:228] Iteration 63, loss = 0.973929 I0413 09:36:22.826509 4436 solver.cpp:244] Train net output #0: accuarcy = 0.714286 I0413 09:36:22.826514 4436 solver.cpp:244] Train net output #1https://github.com/VisionLearningGroup/R-C3D/issues/1: loss_cls = 0.531705 ( 1 = 0.531705 loss) I0413 09:36:22.826519 4436 solver.cpp:244] Train net output #2https://github.com/VisionLearningGroup/R-C3D/issues/2: loss_twin = 0.38329 ( 1 = 0.38329 loss) I0413 09:36:22.826520 4436 solver.cpp:244] Train net output #3https://github.com/VisionLearningGroup/R-C3D/issues/3: rpn_accuarcy = 0.984375 I0413 09:36:22.826522 4436 solver.cpp:244] Train net output #4https://github.com/VisionLearningGroup/R-C3D/issues/4: rpn_accuarcy_class = 1 I0413 09:36:22.826524 4436 solver.cpp:244] Train net output #5https://github.com/VisionLearningGroup/R-C3D/issues/5: rpn_accuarcy_class = 0.666667 I0413 09:36:22.826527 4436 solver.cpp:244] Train net output #6https://github.com/VisionLearningGroup/R-C3D/issues/6: rpn_cls_loss = 0.0271421 ( 1 = 0.0271421 loss) I0413 09:36:22.826530 4436 solver.cpp:244] Train net output #7https://github.com/VisionLearningGroup/R-C3D/pull/7: rpn_loss_twin = 0.0317921 ( 1 = 0.0317921 loss)

  1. I believe the Accuray is rpn_accuarcy, and does the class0 ,1 accuracy indicate that RPN has the accuracy foreground and background classification ?and Train net output #0: accuarcy = 0.714286 indicate the accuarcy of the R-C3D network? 2.the learing rate of conv1a,conv2a.. are 0, Do you think these layers needs backward computation when finetuned? 3.I don't fine the demo.py file,which means I should write the py file when I want to use a video to detect the effect of caffemodel?

I hope you can answer my questions. Wish you a happy day, thanks again!

— You are receiving this because you commented. Reply to this email directly, view it on GitHubhttps://github.com/VisionLearningGroup/R-C3D/issues/20#issuecomment-381006856, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AFOa__3ZgS1VFOeRzWn88tCXzJYFaJO1ks5toBSWgaJpZM4TRAIy.

dlyldxwl commented 6 years ago

thank you for your reply~ and Train net output #0: accuarcy = 0.714286 indicate R-C3D action classification accuarcy is 0.71? LENGTH: [768] indicate that network input is 768 frame picture? my dataest is small and class number is 5, can you give me some advice in parameter settings?Or use the network original parameters directly? I just started learning action recognition, thanks!

dlyldxwl commented 6 years ago

@huijuan88 The demo file has been written and my dataest is small and video length is about 10 seconds, can you give me some advice in parameter settings?Or use the network original parameters directly?

huijuan88 commented 6 years ago

If you just test, you need to follow the origin train parameters.

Since the 10 seconds video is short, maybe a larger fps can be tried to convert the video to more frames.

On Apr 15, 2018, at 03:41, Chen yh notifications@github.com<mailto:notifications@github.com> wrote:

@huijuan88https://github.com/huijuan88 The demo file has been written and my dataest is small and video length is about 10 seconds, can you give me some advice in parameter settings?Or use the network original parameters directly?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/VisionLearningGroup/R-C3D/issues/20#issuecomment-381386986, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AFOa_6YmXfvbSXjiwt7mGENCdT9Wh3E_ks5tovnFgaJpZM4TRAIy.

dlyldxwl commented 6 years ago

@huijuan88 my video fps is 120,so I set FPS = 100 and retrain the work, Train net output #0: accuarcy = 0.963636 , which means the network classification accuracy is 0.96? but i test some videos, result isn't good... can you give me some advices? and i found a error in activitynet_log_analysis.py, line 113, after call get_segments function, we should set predict_data=[],Otherwise only the detection result of the first video can be output to result.json~

huijuan88 commented 6 years ago

100fps is too high. Usually, 25fps is maximum. Also, if your dataset is very different from activityNet, it might not work well without training.

I update activitynet_log_analysis.py.

On Apr 16, 2018, at 09:45, Chen yh notifications@github.com<mailto:notifications@github.com> wrote:

@huijuan88https://github.com/huijuan88 my video fps is 120,so I set FPS = 100 and retrain the work, Train net output #0: accuarcy = 0.963636 , which means the network classification accuracy is 0.96? but i test some videos, result isn't good... can you give me some advices? and i found a error in activitynet_log_analysis.py, line 113, after call get_segments function, we should set predict_data=[],Otherwise only the detection result of the first video can be output to result.json~

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/VisionLearningGroup/R-C3D/issues/20#issuecomment-381603565, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AFOa_8oRSELskVplp-bu36HNfl8rJWV3ks5tpKCLgaJpZM4TRAIy.

dlyldxwl commented 6 years ago

@huijuan88 Yes. now i am training the network by fiuntuning 13500caffemodel. My dataest has 192 videos and 3class, and video length is about 10s, but videos fps are 120, now i want to finetune 135000caffemodel, can you give me some advice on parameter settings? for example: FPS, batchsize,and so on. thx!

sijun-zhou commented 6 years ago

Hi @dlyldxwl

I am a new for action detection and very interested in this field. I use the test script R-C3D/experiments/activitynet/test_net.py to do the test. I am using a card of 1080Ti with 11G memory, but 2.5G was used by other students, so I was only left with 8.5G memory with GPU. But when I run the test_net.py script in ActivityNet , only loaded one 1 video's frams(768 images), but out of memory at the step: blobs_out = net.forward(forward_kwargs) """ F0713 15:08:15.452706 22317 syncedmem.cpp:56] Check failed: error == cudaSuccess (2 vs. 0) out of memory ** Check failure stack trace: Aborted (core dumped) """

I reduce the 768 images to 160 images. It is working fine with me with 8.5G memory left. But if I use 768 images nearly 5 times larger. So I guess I need 40G to 50G GPU memories. And it is difficult to run on pycaffe with multiple GPUs. Could you plz help me! I am a new to action detection. Really appreciated!

so could you plz tell me what is your GPU type and how many GPUs have you used when testing and training this code? Thanks in advance!

dlyldxwl commented 6 years ago

@sijun-zhou hi , I am also a new for action detection. Because of a competition, i trained this model, after that, i didn't work on it. I could tell you my computer configuration, 12G memory and TITAN X(pascal). if you memory isn't enough, i reckon you could use a small batch size or resolution, certainly, it is also work to fix more layers, but which can lead to a bad result. Above are my suggestion, I can't ensure it can work well. Finally, I wish you success in your experiment~

sijun-zhou commented 6 years ago

@dlyldxwl Thanks very much! Really appreciate your reply. I'll look into detail of my problem! :) BTW, you only use 1 TITAN X card?

Thanks, Sijun

dlyldxwl commented 6 years ago

@sijun-zhou yes , i just have one TITANX card

sijun-zhou commented 6 years ago

@dlyldxwl Thank you very much. I'll look into detail of my problem!

YanYan0716 commented 6 years ago

@dlyldxwl i am a new for deep learning, so could you tell me how to use the caffemodel to detect a video just as you said, thanks a lot, best wish for you

viswalal commented 5 years ago

@dlyldxwl , could you please share the demo file you have written?