Hello @sxpro,
Thank you for bringing these points to attention. I will try to answer each in turn:
The final submitted model will be evaluated on a desktop computer configured with an Intel® Xeon 8280 CPU @ 2.70GHz × 56, 128GB RAM, and an NVIDIA RTX 6000 Ada graphics card with 48GB of VRAM.
We use 2K resolution as input to compute inference time. The average is calculated over the 2K videos in the test set (around 10 videos).
We included torch.cuda.synchronize() to ensure accurate timing in our code. Thanks for mentioning this.