Hi, thanks for your great work in movinet.
I met a problem when testing hmdb51 videos.
For example, the inference result for "brush hair" seems weird, in some frames, the result shows "brush hair", while in other frames, it shows "kick ball". Did you met this problem before?
In the code, 16 frames are divided into 2 clips, each clip with 8 frames, but during test phase, the first clip's prediction is different from the second one's, and the ultimate prediction used the second. Is there anything wrong with this?
Hi, thanks for your great work in movinet. I met a problem when testing hmdb51 videos. For example, the inference result for "brush hair" seems weird, in some frames, the result shows "brush hair", while in other frames, it shows "kick ball". Did you met this problem before? In the code, 16 frames are divided into 2 clips, each clip with 8 frames, but during test phase, the first clip's prediction is different from the second one's, and the ultimate prediction used the second. Is there anything wrong with this?