Open luyvlei opened 4 days ago
This seems to be not just a decline in quality, but rather that the generated results are faulty. I haven't encountered any situation where the generated results have crashed directly.
Can I see your training parameters?
Thank you for your response. I conducted the training with a batch size of 8 and a learning rate of 1e-5 for 12,000 steps. Additionally, I made modifications to the code and fine-tuned the model based on Cogvideo5B-I2V.
You'd better truncate the first 16 channels of the "conv in" module of the I2V model, because the subsequent channels contain information from I2V.
The first 16 channels are used for t2v.
Excuse me, does "truncate" here mean to truncate the gradients? The default channel for I2V is 32, which I expanded to 48 for control latents following your code.
Hello, I have also been attempting pose control experiments based on cogvideox recently. My approach is similar to yours, using an additional channel to embed the VAE-compressed pose image into the channel layer. However, in my experiments, I've found that after training for 8000 steps (with a batch size of 8), the image quality deteriorates significantly. Have you encountered similar issues during training? Could this be due to insufficient training? @bubbliiiing
https://github.com/user-attachments/assets/f5cf47c3-83a9-4049-94e3-6ebebf6012c1