G-U-N / AnimateLCM

[SIGGRAPH ASIA 2024 TCS] AnimateLCM: Computation-Efficient Personalized Style Video Generation without Personalized Video Data
https://animatelcm.github.io
MIT License
612 stars 46 forks source link

UCF101 evaluation details #15

Closed yhZhai closed 8 months ago

yhZhai commented 8 months ago

Dear authors,

Thank you for the great work!

I want to seek some clarification on the evaluation details described in Section 5.1 of your paper, particularly concerning the resolution of the snippets generated for the UCF101 dataset analysis. In the section, it's mentioned that the snippets are generated at a resolution of 512x512. However, considering that the original UCF101 videos are at a resolution of 320x240 and the I3D classifier is trained on 224x224 resolution inputs.

Could you kindly provide further insight into the rationale behind selecting a 512x512 resolution for the snippets in this context?

Thank you in advance!

Regards, Yuanhao

G-U-N commented 8 months ago

Hi, @yhZhai .

Thank you for your interest. We generate animations in 512x512 because this is the resolution that stable-diffusion v1-5 trained with. We then scale the 512x512p animations to 224x224p.

For the videos in UCF101, we firstly scale the short side to match the 224 and then conduct center crop to obtain 224x224p animations.

Hope this clarifies any confusion.