Tobi-r9 / RaMViD

MIT License
97 stars 11 forks source link

Some questions about your work #11

Open nihaomiao opened 1 year ago

nihaomiao commented 1 year ago

Hi, @Tobi-r9, thanks a lot for your interesting work! I have some questions about your work.

  1. For the pre-trained models you released, what is the default value of output image size? In your readme, you set the image size to 64 for all the models during training. I am wondering whether your pre-trained models can be used to generate video with the size of 256*256.
  2. Do your model allow class-conditioned generation? I find that your code seems to allow the input of extra class labels. I am wondering whether you try the video generation conditioned on both given images and class labels.
  3. The training/testing split. Could you show the training/testing split for each dataset?
  4. The implementation of resampling may be incorrect. As mentioned in your paper and Repainting, A resampling step is to add one-step noise and then de-noise. Your function forward_diffusion is designed to add Gaussian noise of timestep i to the x_start. In your implementation resampling, you use forward_function add Gaussian noise of timestep i to the img, i.e., $x_{t-1}$, which may just generate a strange result. Could you double-check whether my understanding is correct?
Tobi-r9 commented 1 year ago

Hi, thanks for your interest in our work.

  1. The model itself can only generate 64x64 frames, however, you could use some kind of super-resolution model to increase the resolution frame by frame.
  2. The original code from openai does allow class conditional generation, however, we have not experimented with it. Some minor fixes might be necessary to make it work properly, but it should not be too much work I guess.
  3. The train and test splits are pre-defined for each dataset. Check for kinetics, Bair and UCF-101.
  4. Thank you for letting me know. I will check and get back to you.

I hope this helps :)