Closed bonlime closed 6 months ago
Intentional choice. The model is trained with linear scheduler.
Here's an intuitive explanation: Generally say, we need to use a more noisy scheduler for images with larger resolution, see paper. Videos can be perceived as temporal stacks of images, so you can think its resolution is large. Linear scheduler is more noisy than scaled linear. Therefore I train the model with linear.
Hope this clarifies any confusion.
The explanation makes sense, but just from the personal experience using "linear" results in very over-saturated images, and therefor you have a hack with applying lora with 0.8 weight. If instead you switch to "scaled_linear" the images are less saturated buy also tend to have more details, and in combination with higher CFG result in much better videos overall
Not really sure about that. Did you do any experiments on that? I think after training, the model will adapt to be good as long as the scheduler belonging to a reasonable interval.
If you use the real image/video to supervise it, it should be able to reflect the property of your training data no matter what your scheduler is.
That is different from simply changing "scale linear" to "linear" without any finetuing.
I think the reason why it still works better with "scaled_linear" is due to the fact the the base SD1.5 used that schedule and your fine-tuning still can't change the model enough to adapt to "linear" schedule. You can just try and experiment with it a little bit yourself, to be the difference is very clear (using sd1.5 + T2V)
Any examples?
Here in an example inference config you use "linear" for inference, while the SD1.5 uses "scaled_linear", it seems to make quality much worse. Is this an intentional choice or an error?