As a model trainer, it would be nice if we could use this directed policy optimization trainer to train just the bias of the U-net, keeping the weights frozen.
Initial testing shows that this approach allows us to carefully direct the model toward better details / aesthetics while maintaining most of the model's core structure.
Where full weight and bias tuning results in almost complete destruction of SD 2.1-v using just 8 images for finetuning, this method allows pushing past 400 epochs on the same dataset.
Example:
The starting point ^
After just 810 steps ^
This is without any DPO, simply finetuning based on MSE loss and velocity objective.
Comparison, the mode collapse of SD 2.1-v when tuning weights and bias which occurs in fewer steps:
This is using the same hyperparameters, eg. learning rate/scheduler/dataset/seeds.
Paper: https://aclanthology.org/2022.acl-short.1/
Summary (my words):
As a model trainer, it would be nice if we could use this directed policy optimization trainer to train just the bias of the U-net, keeping the weights frozen.
Initial testing shows that this approach allows us to carefully direct the model toward better details / aesthetics while maintaining most of the model's core structure.
Where full weight and bias tuning results in almost complete destruction of SD 2.1-v using just 8 images for finetuning, this method allows pushing past 400 epochs on the same dataset.
Example:
The starting point ^
After just 810 steps ^
This is without any DPO, simply finetuning based on MSE loss and velocity objective.
Comparison, the mode collapse of SD 2.1-v when tuning weights and bias which occurs in fewer steps:
This is using the same hyperparameters, eg. learning rate/scheduler/dataset/seeds.