Open yjhong89 opened 2 weeks ago
Noised image concat: The performance are similar when concatenating images in different ways. Learnable positional embedding: We observed that removing the learnable position embedding does not have a significant impact on the results, so you can remove it directly and reinitialize a new one.
Hi! I am currently fine-tuning I2V released model on the sat folder and need some advices for fine-tuning I2V model.
Noised image concat
Learnable positional embedding of ROPE
Learning rate and schedule
Thanks!