Open abcdvzz opened 2 weeks ago
In the train.py, the code doesn't use sample['ref_pixel_values'] or sample['clip_pixel_values']. Does this mean the image to video generation training uses a random frame instead of the first frame?
I don't understand this meaning. The image generated video is implemented through inpaint, and the parts that need to be reconstructed are masked
In the train.py, the code doesn't use sample['ref_pixel_values'] or sample['clip_pixel_values']. Does this mean the image to video generation training uses a random frame instead of the first frame?