Open pilibb0712 opened 2 months ago
Hi. Actually, the image_id is the clip_id used during the evaluation. During the testing time, we report the average accuracy for each video clip which is extracted from the original videos with a fixed sampled stride (=20) instead of reporting the accuracy on the whole video level.
Hi, thank you for your awesome work! There is a question about the final prediction result of lvu_cls. I have found that in your code, the evaluation process are based on the prediction result of images which corresponds to the key of 'image_id' in result file. How can I aggregate the results of images to obtain the prediction result of a whole video when there exist multiple image predictions of the same video?