fvd performance on MSR-VTT

ali-vilab / videocomposer

Official repo for VideoComposer: Compositional Video Synthesis with Motion Controllability

https://videocomposer.github.io

MIT License

902 stars 81 forks source link

fvd performance on MSR-VTT #16

Open jinxixiang opened 1 year ago

jinxixiang commented 1 year ago

Thank you for presenting such an exciting work. Congratulations!

I have a question regarding Table A3. Could you please provide more details on how the FVD is calculated? As this metric can be very sensitive to certain settings, I would like to know more about the resolution (256?), number of frames, and how you processed the captions. Additionally, I noticed that there are multiple correspondences to the same video. Could you please explain how you handled this?

Thank you!

jinxixiang commented 1 year ago

I have also assessed the performance of depth control in comparison to two comparable video generation studies. The Videocomposer performs quite well.

resolution at 256*256

videocomposer: {"fid50k_full": 32.85220698351932} {"fvd2048_16f": 292.60702360087384}

controlvideo (https://github.com/YBYBZhang/ControlVideo) {"fid50k_full": 53.00012164453821}, {"fvd2048_16f": 716.0677660664312}

control-a-video (https://github.com/Weifeng-Chen/control-a-video) {"fid50k_full": 43.23543254253663} {"fvd2048_16f": 339.1774903700448}

Steven-SWZhang commented 1 year ago

Thank you for your interest in our method. We are currently optimizing the V2 version, and the watermark-free version of V1 is already available (https://www.modelscope.cn/models/damo/VideoComposer/files). The UI interaction has also been developed, and we welcome you to try it out: https://www.modelscope.cn/studios/damo/VideoComposer-Demo/summary