Closed yingchengyang closed 2 months ago
Hi! Thanks for your interest. The results are calculated by generated videos and their ground truths. In our case, we give one video, i.e. ground truth, as a condition to the model to generate one corresponding video. As for metric calculation, please kindly refer to this repo for more details :)
Thanks a lot for such a wonderful work! I'm curious about the Quantitative comparison of different video models (like videogpt and vqdiffusion) in table 1 of the paper. How can I evaluate the performance of video models? Thanks a lot!