Open mmiakashs opened 7 months ago
Hello. Each clip is 5 seconds long with 25 frames per second, totaling 125 frames. The point of contact typically occurs at the 75th frame. We trimmed the clips using --start_frame 63, --end_frame 87, and --fps 17 to capture only the frames where the foul takes place, but feel free to adjust the values to your needs.
In the paper, it has been mentioned that "For both classification tasks, we leverage clips of 16 frames, spanning temporally for 1 second, with a spatial dimension of 224×398 pixels. Specifically, the clips contain 8 frames before the foul and 8 frames after the foul." However, the data annotation lacks timestamps indicating when the foul occurred. Could you please share the details of how the 1-second clip was extracted?