Some questions about your work

Hi, @Tobi-r9, thanks a lot for your interesting work! I have some questions about your work.

For the pre-trained models you released, what is the default value of output image size? In your readme, you set the image size to 64 for all the models during training. I am wondering whether your pre-trained models can be used to generate video with the size of 256*256.
Do your model allow class-conditioned generation? I find that your code seems to allow the input of extra class labels. I am wondering whether you try the video generation conditioned on both given images and class labels.
The training/testing split. Could you show the training/testing split for each dataset?
The implementation of resampling may be incorrect. As mentioned in your paper and Repainting, A resampling step is to add one-step noise and then de-noise. Your function forward_diffusion is designed to add Gaussian noise of timestep i to the x_start. In your implementation resampling, you use forward_function add Gaussian noise of timestep i to the img, i.e., $x_{t-1}$, which may just generate a strange result. Could you double-check whether my understanding is correct?

Tobi-r9 / RaMViD