Dear authors, thanks for your awesome work. Could you please provide a inference code for the video tokenizer reconstruction. I wrote one but found that the reconstruction results is bad after 4 frames, could any reason be possible with this issue? Below are the results of the 1st, 5th and 9th frames.
Dear authors, thanks for your awesome work. Could you please provide a inference code for the video tokenizer reconstruction. I wrote one but found that the reconstruction results is bad after 4 frames, could any reason be possible with this issue? Below are the results of the 1st, 5th and 9th frames.