CompVis / stable-diffusion

A latent text-to-image diffusion model
https://ommer-lab.com/research/latent-diffusion-models/
Other
67.52k stars 10.08k forks source link

Resume training on custom data #77

Open affanmehmood opened 2 years ago

affanmehmood commented 2 years ago

I want to further train stable-diffusion-v1-4 on my custom dataset. I couldn't find any training script in the repo. Can anyone tell me how can this be accomplished? Is there a training script available so I can resume training?

1blackbar commented 2 years ago

https://github.com/nicolai256/Stable-textual-inversion_win

janekm commented 2 years ago

https://github.com/nicolai256/Stable-textual-inversion_win

That’s a different method of achieving a similar result… I believe the OP was talking about resuming training of the SD model itself. I am also very interested in this especially in light of the Dreambooth paper: https://dreambooth.github.io/ (I think it would be very interesting to try this approach with SD). There’s training code and settings for latent diffusion but I’m not sure if it would be fruitful to try it with stable especially without knowing the training parameters that were used.

chavinlo commented 2 years ago

Theres this but its ported and requires a beefy 48gb gpu https://github.com/Jack000/glid-3-xl-stable

janekm commented 2 years ago

Theres this but its ported and requires a beefy 48gb gpu https://github.com/Jack000/glid-3-xl-stable

Oh but that is exciting! I'll have to give it a try. (my naive theory is to try something similar to the dreambooth paper, by trying to find a prompt word that is basically unknown to SD, and then using that as the training captions for some new images)

chavinlo commented 2 years ago

Theres a user that managed to get "full model train with validation" on a 3090, but now 54gb of ram is needed. If he/she releases the code i will lyk @janekm

wangyue-gagua commented 2 years ago

https://github.com/nicolai256/Stable-textual-inversion_win

That’s a different method of achieving a similar result… I believe the OP was talking about resuming training of the SD model itself. I am also very interested in this especially in light of the Dreambooth paper: https://dreambooth.github.io/ (I think it would be very interesting to try this approach with SD). There’s training code and settings for latent diffusion but I’m not sure if it would be fruitful to try it with stable especially without knowing the training parameters that were used.

Is there any approach to get the source code of dreambooth or will google provide web service about it like midjourney?

nihirv commented 2 years ago

I'm also looking for some training code for this repo (either to train from scratch or to fine-tune). Could anyone point me in the right direction?

janekm commented 2 years ago

Since Textual Inversion was already mentioned, it's worth mentioning here that the "Dreambooth" paper technique has been implemented on top of Stable Diffusion (and has advantages in many scenarios where someone might think of finetuning the model directly): https://github.com/XavierXiao/Dreambooth-Stable-Diffusion

nihirv commented 2 years ago

Thank you for pointing me to that @janekm! However what I'm looking to do is condition the model on another image. I.e. I want to feed it 2 images (instead of image+text) and use the second image as a condition. I've been thinking of just replacing the CLIP feature with the embedding of the second image instead of the text embedding, but I think this'll require me to actually fine-tune a diffusion model instead of using textual inversion

RexSi commented 2 years ago

信已收到,祝你天天快乐.

janekm commented 2 years ago

This repo has been doing "traditional" fine-tuning training on top of stable diffusion, so may have the code that you are looking for (the CompVis repo also has training code in main.py but I've seen reports that it doesn't work out of the box): https://github.com/harubaru/waifu-diffusion (train.sh should be the entry-point)