Closed ray342649093 closed 7 years ago
Hi, thank you for your interest.
I used FPS=25 for all test videos although it should not affect performances a lot.
As for frame extraction, instead of cv2, I used ffmpeg to extract frames in png format. You could refer to my previous scnn demo code to learn more about this.
I tried to repeat your experiment on THUMOS14. So , I downloaded the THUMOS14 test dataset and used part of the code from C3D project to extract frames from 213 videos in the test dataset. Then I got 1351825 frames in total, which was different from the number of frames you extracted (around 1157824 from your postprocess codes). Then I used your python code to generate 42347 bin files while yours was 36182. So, I changed the number of mini batches to 10567 and output 42347 features.
I generated my own ground truth lables per frame and run your postprocess codes. finally got 0.1426 map. I found that the probability looked ugly, most of the frames have high probability for background and others do not have high enough probability for every actions even with a low probability for background. could you see what might be the problem? The code I used to extract frames is attached below.