Closed theartpiece closed 2 years ago
Hi @theartpiece, Thanks for your comments. We use multi-clip testing and the following function is used to merge the scores: https://github.com/MCG-NJU/VideoMAE/blob/418d3d695365fcb658d003998b6a32a51a1f0d86/engine_for_finetuning.py#L233
Under this section Finetune, it's written that during test, you consider multiple segments and multiple crops
But while calculating accuracy, we don't aggregate accuracy scores over all these segments/crops-- https://github.com/MCG-NJU/VideoMAE/blob/main/engine_for_finetuning.py#L180 https://github.com/rwightman/pytorch-image-models/blob/master/timm/utils/metrics.py#L25
acc1, acc5 = accuracy(output, target, topk=(1, 5))
Instead while calculating accuracy we should have done something like this--