After that, in line 137-145 of inference.py, when the frames are preprocessed, only the frames with enumerated idx in the frame_idx will be appended to the video
for idx, frame in enumerate(video):
image_processor.do_resize = False
image_processor.do_center_crop = False
frame = process_anyres_video_genli(frame, image_processor)
if frame_idx is not None and idx in frame_idx:
video_processed.append(frame.unsqueeze(0))
elif frame_idx is None:
video_processed.append(frame.unsqueeze(0))
It means only the frames sampled from the first two seconds is choosen. It seems unreasonable. I wonder whether it is a mistake or it is designed like that?
In line 116-119 of inference.py , the frames are uniformly sampled from the video
Suppose I have a 20sec 30fps video and I set the frames_upbound as 64, it will sample following frames
After that, in line 137-145 of inference.py, when the frames are preprocessed, only the frames with enumerated idx in the frame_idx will be appended to the video
The enumerated idx will be
Therefore, only the following idx will be choosen
It means only the frames sampled from the first two seconds is choosen. It seems unreasonable. I wonder whether it is a mistake or it is designed like that?