Vchitect / SEINE

[ICLR 2024] SEINE: Short-to-Long Video Diffusion Model for Generative Transition and Prediction
https://vchitect.github.io/SEINE-project/
Apache License 2.0
906 stars 64 forks source link

Prediction mode missing? #4

Open quixot1c opened 11 months ago

quixot1c commented 11 months ago

Hi, thanks for your great work!

In the GIF in the readme an animation of Ironman can be seen. Under it the text "Prediction" is shown. However, when looking through the code I cannot find a way to enable this prediction mode. Is this code not available or am I missing something?

ExponentialML commented 11 months ago

This may be what you're looking for. Change args.mask_type at this line to 'all'. If you don't want to hardcode it, you can just add a custom argument like args.mask_before and use that in your yaml. The function shows the available choices to use.

https://github.com/Vchitect/SEINE/blob/3795b24729beaafa3a5fa98dfe7e72ef245f0892/sample_scripts/with_mask_sample.py#L228

quixot1c commented 11 months ago

@ExponentialML Thanks for your response. Using 'all' made the video look only very vaguely like the input image. A bit like img2img on high denoising strength. Is that to be expected?

ExponentialML commented 11 months ago

@quixot1c No problem. Yes, that's to be expected as it's essentially generating frames "on the go".

If you want to generate it in an autoregressive manner, it's not yet implemented script wise. A makeshift, albeit slower way would be to generate a video based on one init frame, then use the last frame from the new generated video as the new first frame, rinse repeat, then join all the videos together.

I have a custom script that works like this locally, but it's not quite ready to release at the moment as it's still in testing.