LambdaLabsML / examples

Deep Learning Examples
MIT License
805 stars 103 forks source link

Resume from checkpoint #36

Open janniehues opened 1 year ago

janniehues commented 1 year ago

Hello,

Thank you for giving this example on how to finetune stable diffusion! The training seems to be working fine. However, when I try to resume from an intermediate checkpoint that was created upon training, the reconstructed images come out to be just noisy blurs. Any ideas what I am doing wrong here when resuming training from a checkpoint that was created on the way? I just changed the .cpkt in finetune_from and also tried the resume_from flag, but both did not work. Thank you for your help

cxhermagic commented 1 year ago

I have the same issue, please help, thank you .

nocol0101001 commented 1 year ago

I also have the same problem, how to solve it. Thanks

nocol0101001 commented 1 year ago

!(python main.py \ -t \ --base configs/stable-diffusion/pokemon.yaml \ --gpus "0," \ --scale_lr False \ --num_nodes 1 \ --check_val_every_n_epoch 10 \ --finetune_from "$ckpt_path" \ data.params.batch_size="2" \ lightning.trainer.accumulate_grad_batches="1" \ data.params.validation.params.n_gpus="$NUM_GPUS" \ )

Hello, what I get from training like this is still a mass of noise, what is the cause. Thanks

janniehues commented 1 year ago

Hello,

To fine-tune stable diffusion I switched to the original repository now. You need to copy the bits from this repository that is in main.py and where the model state is read in. Then just resume training with the code from the original stable diffusion github.

nocol0101001 commented 1 year ago

Hello, I followed the steps of the author to fine-tune Pokemon step by step, but I only temporarily collected 10 custom pictures and corresponding prompts for training. Noise, I don't know if it's my dataset or the model. Also, I don't quite understand what you mean, can you explain in more detail? Many thanks!

Here are some configurations of my dataset

------------------ 原始邮件 ------------------ 发件人: "LambdaLabsML/examples" @.>; 发送时间: 2023年5月4日(星期四) 下午5:58 @.>; @.**@.>; 主题: Re: [LambdaLabsML/examples] Resume from checkpoint (Issue #36)

Hello,

To fine-tune stable diffusion I switched to the original repository now. You need to copy the bits from this repository that is in main.py and where the model state is read in. Then just resume training with the code from the original stable diffusion github.

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you commented.Message ID: @.***>

janniehues commented 1 year ago

Ok 10 pictures are not very many. Maybe have a look at the dreambooth algorithm https://arxiv.org/abs/2208.12242 then. That should also have a github repo somewhere .