RERV / VDT

[ICLR2024] The official implementation of paper "VDT: General-purpose Video Diffusion Transformers via Mask Modeling", by Haoyu Lu, Guoxing Yang, Nanyi Fei, Yuqi Huo, Zhiwu Lu, Ping Luo, Mingyu Ding.
Other
195 stars 9 forks source link

文中的Mask机制,在代码中对不上 #10

Open Nutingnon opened 4 months ago

Nutingnon commented 4 months ago

你好,我发现文中的Mask那一段,在代码里没找到对应的方法。

quantumiracle commented 3 months ago

Agreed, the code only contains first frame condition with concatenation. Could the author provide the corresponding code for masking?

RERV commented 2 months ago

Hi everyone, my apologies for the late reply. I was quite busy earlier and couldn't get to it. I've now updated the mask modeling, and you can find the necessary code in it.