Closed fradino closed 1 year ago
When decoding with the VAE or run_isolated is set to true, S=M.
When using ddim_inversion_long
or gen_long
, S = Stride, M=clip_length.
And I just find a typo there, the total number of frames should be S*(N-1)+M. Thanks for your issue, and I will re-correct it when uploading a newer version.
Hello, I have some question about the pipeline of One-shot tuning Text-to-Video algorithm. I am confused about how the algorithm below is reflected in the One-shot tuning code. In the paper, it said 'The total number of frames of the video is S ∗ N + M' In the code, the for loop is 'for i in range(0,video_length-clip_length+1,clip_length):' Is it means S==M in this code? Thank you very much!