Abhinay1997 / FIFO-CogVideoX

FIFO applied to CogVideoX models
1 stars 0 forks source link

Pipeline produces noise in output #1

Open Abhinay1997 opened 2 months ago

Abhinay1997 commented 2 months ago

The output from the model is just plain noise. Pretty close to: image_10

Currently debugging why thats the case. Need to check if something went wrong when changing the logic from applying same timestep to all frames to making each frame have its own timestep embedding in the tensor. One simple test is to implement the original pipe call from diffusers using the modified transformer and scheduler.

Abhinay1997 commented 2 months ago

There were a couple of issues:

  1. The alphas and betas broadcasting in the sampler was incorrect.
  2. The sfhit_latents implementation was incorrect.

Both of them stem from the issue with shapes being different from latte t2v and cogvideo. viz. dim 1 has channels in latte but frames in cog.

There's still a bug though. But the images now look like this: image

Abhinay1997 commented 2 months ago

Things to try:

  1. Instead of broadcasting across all frames in the scheduler step, use a loop as in the original implementation and see if it makes a difference. Not really sure if the scheduler auto increments the timestep when called in a loop. Just to remove that uncertainity.

  2. Run it with the simplest configuration. a.k.a one partition, no lookahead denoising. And match every step with the algo in the paper.