Open jinxixiang opened 1 year ago
I have also assessed the performance of depth control in comparison to two comparable video generation studies. The Videocomposer performs quite well.
resolution at 256*256
videocomposer: {"fid50k_full": 32.85220698351932} {"fvd2048_16f": 292.60702360087384}
controlvideo (https://github.com/YBYBZhang/ControlVideo) {"fid50k_full": 53.00012164453821}, {"fvd2048_16f": 716.0677660664312}
control-a-video (https://github.com/Weifeng-Chen/control-a-video) {"fid50k_full": 43.23543254253663} {"fvd2048_16f": 339.1774903700448}
Thank you for your interest in our method. We are currently optimizing the V2 version, and the watermark-free version of V1 is already available (https://www.modelscope.cn/models/damo/VideoComposer/files). The UI interaction has also been developed, and we welcome you to try it out: https://www.modelscope.cn/studios/damo/VideoComposer-Demo/summary
Thank you for presenting such an exciting work. Congratulations!
I have a question regarding Table A3. Could you please provide more details on how the FVD is calculated? As this metric can be very sensitive to certain settings, I would like to know more about the resolution (256?), number of frames, and how you processed the captions. Additionally, I noticed that there are multiple correspondences to the same video. Could you please explain how you handled this?
Thank you!