MCG-NJU / VideoMAE

[NeurIPS 2022 Spotlight] VideoMAE: Masked Autoencoders are Data-Efficient Learners for Self-Supervised Video Pre-Training
https://arxiv.org/abs/2203.12602
Other
1.36k stars 135 forks source link

Accuracy calculation in test has redundant instances #46

Closed theartpiece closed 2 years ago

theartpiece commented 2 years ago

Under this section Finetune, it's written that during test, you consider multiple segments and multiple crops

    --test_num_segment 2 \
    --test_num_crop 3 \

But while calculating accuracy, we don't aggregate accuracy scores over all these segments/crops-- https://github.com/MCG-NJU/VideoMAE/blob/main/engine_for_finetuning.py#L180 https://github.com/rwightman/pytorch-image-models/blob/master/timm/utils/metrics.py#L25

acc1, acc5 = accuracy(output, target, topk=(1, 5)) Instead while calculating accuracy we should have done something like this--

scores=pd.dataFrame({ "id" : ids, "outputs" : outputs, "labels" : labels })
scores=scores.groupby([ "ids" , "labels" ]).aggregate({"outputs": lambda x : max(x)})
yztongzhan commented 2 years ago

Hi @theartpiece, Thanks for your comments. We use multi-clip testing and the following function is used to merge the scores: https://github.com/MCG-NJU/VideoMAE/blob/418d3d695365fcb658d003998b6a32a51a1f0d86/engine_for_finetuning.py#L233