Reproducing full resolution Swin-T baseline from FastVQA paper

VQAssessment / FAST-VQA-and-FasterVQA

[ECCV2022, TPAMI2023] FAST-VQA, and its extended version FasterVQA.

https://www.ecva.net/papers/eccv_2022/papers_ECCV/html/1225_ECCV_2022_paper.php

Other

244 stars 24 forks source link

Reproducing full resolution Swin-T baseline from FastVQA paper #42

Open sh-r opened 7 months ago

sh-r commented 7 months ago

Hello. Thanks you for your great work! I had a question about the full-resolution Swin-T baseline given in the FastVQA paper. It is mentioned that fixed recognition features were regressed to get the baseline. Does this mean all frames of the video (no temporal sampling) and no fragmentation or resizing was done? Or was the temporally sampled video the input to the Swin-T model for generating the fixed features?