about the evaluation of action localization

Qinying-Liu commented 6 years ago

Hi, @huijuan88 ， thank you for sharing your code! In the paper "R-C3D: Region Convolutional 3D Network for Temporal Activity Detection", it is said the each video is divided into several segments of 768 frames, and R-C3D only produces detection results which belong to the segment inputed into the network.Is it necessary to Integrate the detection results of each segment to get the detection results for the complete videos in order to calculate the mAP? If yes, how do you achieve this? Any help will be appreciated !

huijuan88 commented 6 years ago

In the post-processing code, predictions from each segment will be merged.

On Jul 7, 2018, at 22:27, canbaoburen notifications@github.com<mailto:notifications@github.com> wrote:

Hi, thank you for sharing your code!In the paper "R-C3D: Region Convolutional 3D Network for Temporal Activity Detection", it is said the each video is divided into several segments of 768 frames, and R-C3D only produces detection results which belong on the segment inputed into the network. How can we get the detection results on the entire video in order to calculate the mAP?

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHubhttps://github.com/VisionLearningGroup/R-C3D/issues/32, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AFOa_2XyS5FicMN6XjvBbiiMN1px5mAuks5uEW3-gaJpZM4VGkB2.

Qinying-Liu commented 6 years ago

@huijuan88 ， Sorry to bother you again. When you say "merged", do you mean that you just simplely keep all the predictions from the segments which belong to the same video and treat them as predictions for this video? Or ,do you mean that you merge the predictions with some strategy in the post-pocessing code as you said? If so, where can i find these codes? I've glanced over the codes under the "R-C3D/experiments/activitynet/test/" folder ,but have not found the codes for "merging".

huijuan88 commented 6 years ago

Yes. See the file “”activitynet_log_analysis.py”.

On Jul 8, 2018, at 22:27, canbaoburen notifications@github.com<mailto:notifications@github.com> wrote:

@huijuan88https://github.com/huijuan88 ， Sorry to bother you again. When you say "merged", do you mean that you just simplely keep all the predictions from the segments which belong to the same video and treat them as predictions for this video? Or ,do you mean that you merge the predictions with some strategy in the post-pocessing code as you said? If so, where can i find these codes? I've glanced over the codes under the "R-C3D/experiments/activitynet/test/" folder ,but have not found the codes for "merging".

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/VisionLearningGroup/R-C3D/issues/32#issuecomment-403342083, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AFOa_0RcSYyQVivYIbF8PzTArlI2sBifks5uEr94gaJpZM4VGkB2.

sijun-zhou commented 6 years ago

Hi @canbaoburen I am a new for action detection and very interested in this field. I use the test script R-C3D/experiments/activitynet/test_net.py to do the test. I am using a card of 1080Ti with 11G memory, but 2.5G was used by other students, so I was only left with 8.5G memory with GPU. But when I run the test_net.py script in ActivityNet , only loaded one 1 video's frams(768 images), but out of memory at the step: blobs_out = net.forward(forward_kwargs) """ F0713 15:08:15.452706 22317 syncedmem.cpp:56] Check failed: error == cudaSuccess (2 vs. 0) out of memory ** Check failure stack trace: Aborted (core dumped) """

I reduce the 768 images to 160 images. It is working fine with me with 8.5G memory left. But if I use 768 images nearly 5 times larger. So I guess I need 40G to 50G GPU memories. And it is difficult to run on pycaffe with multiple GPUs. Could you plz help me! I am a new to action detection. Really appreciated!

so could you plz tell me what is your GPU type and how many GPUs have you used when testing and training this code? Thanks in advance!

VisionLearningGroup / R-C3D

about the evaluation of action localization #32