RockeyCoss / SPO

Step-aware Preference Optimization: Aligning Preference with Denoising Performance at Each Step
https://arxiv.org/abs/2406.04314
137 stars 3 forks source link

Lora or full parameter #7

Closed jiashenggu closed 2 months ago

jiashenggu commented 2 months ago

Hi, great work

I'm curious if you are using only lora for training?

SPO-SDXL_4k-prompts_10-epochs is a merger of SDXL-base and SPO-SDXL_4k-prompts_10-epochs_LoRA, not the one you used for full-parameter training?

Have you tried full parameter training? Will full parameter training introduce some problems, such as training instability?

Thank you very much

RockeyCoss commented 2 months ago

Yes, SPO-SDXL_4k-prompts_10-epochs is a merger of SDXL-base and SPO-SDXL_4k-prompts_10-epochs_LoRA. I merged LoRA into the base model for user use.

We have only fine-tuned the LoRA of SDXL, as fine-tuning the full weights of SDXL demands significant computational resources. We believe that fine-tuning the LoRA weights offers a good balance between performance and resource requirements. If you want to try full parameter fine-tuning, you may need to adjust hyperparameters such as the learning rate (lr) and others.