CarperAI / DRLX

Diffusion Reinforcement Learning Library
MIT License
171 stars 7 forks source link

Add support for BitFit #31

Open bghira opened 7 months ago

bghira commented 7 months ago

Paper: https://aclanthology.org/2022.acl-short.1/

Summary (my words):

As a model trainer, it would be nice if we could use this directed policy optimization trainer to train just the bias of the U-net, keeping the weights frozen.

Initial testing shows that this approach allows us to carefully direct the model toward better details / aesthetics while maintaining most of the model's core structure.

Where full weight and bias tuning results in almost complete destruction of SD 2.1-v using just 8 images for finetuning, this method allows pushing past 400 epochs on the same dataset.

Example:

image The starting point ^

image After just 810 steps ^

This is without any DPO, simply finetuning based on MSE loss and velocity objective.

Comparison, the mode collapse of SD 2.1-v when tuning weights and bias which occurs in fewer steps:

image

This is using the same hyperparameters, eg. learning rate/scheduler/dataset/seeds.