OpenGVLab / UniFormerV2

[ICCV2023] UniFormerV2: Spatiotemporal Learning by Arming Image ViTs with Video UniFormer
https://arxiv.org/abs/2211.09552
Apache License 2.0
291 stars 18 forks source link

Evaluation with model trained on mit gives incorrect results #24

Closed anjugopinath closed 1 year ago

anjugopinath commented 1 year ago

Hi,

I tried testing the model trained on mit on a different dataset. But, I noticed that the video labels in the output .pkl file are the same as that of the labels given in the input file (data_list/mit/test.csv)

Input file (data_list/mit/test.csv):

image

Output file (,pkl file):

image

As you can see, the labels for the last 11 videos in the input file is "1" and the output video labels are also "1" for the last 11 videos. And "0" for the other videos, again same as the input label.

I don't know what I am doing wrong.

This is the test.sh

image
Andy1621 commented 1 year ago

Thanks for your question. What you have done is right. The video labels mean the ground truth, not the predictions. To achieve the prediction, you should use torch.from_numpy(res['video_preds']).softmax(-1). I store the labels in the pkl file to calculate the accuracy.