Closed casillas1111 closed 1 year ago
In the current implementation, during inference, for FAST-VQA as temporal GMS sampling is not conducted, the actual way is to use 4 clips (each 32 frames) to cover the whole video (usually 5-10 seconds). For FasterVQA, however, as we only use 1 clip (concatenated by 4 temporal segments, each 8 frames, in total 32 frames), we can avoid using four clips to get more accurate results.
The main boost is during inference.
Hope this can help.
It seems that the difference between FasterVQA and FastVQA is the application of St-GMS. However, in my understanding, St-MS samples more areas in the temporal domain, which will not improve efficiency. On the contrary, the implementation may become less efficient because the loop for t becomes larger. Right? I don't know what's wrong with my understanding.
Looking forward to your reply.