Open ariandemnika opened 1 year ago
set the unet steps to 3000 and reduce the number of instance images, make sure you rename them correctly, set the unet learning rate to 3e-6
@TheLastBen I've tried with 3000 steps it worked, but i got the same results with 1800 steps on lr=3e-6. im using only 10 instance images. the quality have been better before this is not the best what should i do more ?
This was generated before on google colab, before this last update.
this is now on aws ec2 T4 GPU (used CodeFormer)
Try retraining the model with the new default settings lr 2e-5, 200 textenc, 650 unet
@TheLastBen every image shows almost like original ones now
how many total steps ? did you go over 1000 ?
and don't resume training, start over the training with 10 instance images, 350 text encoder 1e-6 and 600-800 unet 2e-5
@TheLastBen 200 steps on text encoder, 650 steps on unet, total 850 steps
redo the training and set the unet steps this time to 400, keep the other settings like before, and test it
@TheLastBen face not like the trained images it's completely different Unet
steps: 400 lr: 2e-5
Text encoder:
steps: 350 lr: 1e-6
then resume the unet training for 100 more steps and so on ...
I'm having similar issues. I've been unable to match previous results using the latest version of Dreambooth.
me too i can't match the results! @TheLastBen how can i clone the October repo ?
Because the learning rate was increased, so now the unet steps should not go over 1000 for 15 or less instance images, keep the steps low and slowly add 100 at a time.
Because the learning rate was increased, so now the unet steps should not go over 1000 for 15 or less instance images, keep the steps low and slowly add 100 at a time.
Thank you for your answer.
Is there a place where I can read more about these changes and/or the process of training models?
you can search in this repo discussions and issue, there are a lot of topic regarding the training
I also struggle achieving the results I had before - the best were when there was a percentage for the encoder and when I still had to specify woman/man. I haven’t seen an explanation as to why that is no longer needed, and what has changed. Is there a way to emulate the old behaviour - should I be using tags instead? I also didn’t find much of a description of them. And lastly, what impact does the “increased learning rate” have, and where would I manually set it, why, and to what?
@TheLastBen do we have to do prompt engineering, or just give it simple prompts ?
@Quark999 increasing the learning rate speeds up the training, after doing some tests, I found that 2e-5 is just below the limit, so with that setting, you can train under 15 minutes and get even better results than before.
@ariandemnika You can use simple prompts, but not too simple, add "movie still" to the prompt and "cinematic" to help with the quality.
I also struggle achieving the results I had before - the best were when there was a percentage for the encoder and when I still had to specify woman/man. I haven’t seen an explanation as to why that is no longer needed, and what has changed. Is there a way to emulate the old behaviour - should I be using tags instead? I also didn’t find much of a description of them. And lastly, what impact does the “increased learning rate” have, and where would I manually set it, why, and to what?
Same here. It would be super helpful if anyone could provide the learning rate for both the unet and text encoder that was used before they were added as settings in the colab. That way we can have them as references to compare with the new values. Also, I was previously using ~50 instance images with good results. With these new settings should I be using fewer images? Are regularization images recommended for a single person training? Afaik, they were used before (with the woman/man setting) and I got good results, but idk if it had anything to do with that or with the learning rate settings. Thanks!
with the new settings, stick to 10 images per instance and around 600-800 unet steps per instance, and total text_enc steps to 400.
no need for regularization
@TheLastBen I've trained text_enc ( learning_rate=1e-6, max_train_steps=350, lr_scheduler="polynomial"
) UNet - ( learning_rate=2e-5, max_train_steps=650, lr_scheduler="polynomial"
) used 10 images.
I've tested on kylie jenner and this is the best result i've got, and it doesn't look like her as you can se below, I've tested up to 1000 unet steps didn't work. I don't know what's happening!
prompt="Vaporwave portrait of tpUJoQGb person, realistic portrait, pinkish vaporwave colors, vibrant, purple neon colors, gradients, symmetrical highly detailed, digital painting, arstation, concept art, smooth, sharp focus, illustration, cinematic lighting art by Artgerm and Greg Turkowski and Alphonse Mucha",
num_inference_steps=50,
guidance_scale=7,
width=512,
height=512
With just 10 images for an instance, how easy is it to capture face, mid-body, and full body including feet, and from different angles? I find when collecting images I end up with more just to cover the basics. If I had say 30 images, I suspect my old rule of thumb of 100 steps per extra image no longer works - but what would I do if I did want to train on those extra images?
@ariandemnika don't add "person" in the prompt, you are reducing the weight of the trained subject
Yep same here. the new 10 instance 650 settings produce laughable results unfortunately. I wish there was a copy of the previous 3000 2e-6 colab we could still use as that was perfect and worked every single time
@ariandemnika don't add "person" in the prompt, you are reducing the weight of the trained subject
@TheLastBen this just made it worse, you mean the UNet training instance_prompt
argument ?
Don't use "person" or any similar word in the inference prompt and in the instance name/prompt
@TheLastBen still not getting anything better.
@TheLastBen still not getting anything better.
I added back the previous settings
For me, the current version needs about 350 unet steps to produce similar results to the previous one with 1000. In case this helps someone
@TheLastBen still not getting anything better.
I added back the previous settings
@TheLastBen I've added back to the old training sets and im not getting the results again, text encoder: lr: 1e-6 350 or 900 steps, unet: lr: 2e-6 and 1000 or 1800 steps with 10images.
so the problem isn't with the settings I changed, it's with your instance images, since everything was the same as before
@TheLastBen the training is good now using 250
steps for text encoder with lr: 1e-6
, and UNet 650
steps with lr: 1e-5
using 6
images, but i guess my generating code has something wrong! im using the trained model on google colab and i get amazing results!!
Im using this code to generate images of trained model:
scheduler = DDIMScheduler.from_pretrained(model_id, subfolder="scheduler")
pipe = StableDiffusionPipeline.from_pretrained(model_id, scheduler=scheduler, torch_dtype=torch.float16).to("cuda")
pipe.safety_checker = lambda images, clip_input: (images, False)
pipe.enable_attention_slicing()
img_name = random.randint(999999999,9999999999)
image = pipe(
prompt=prompt,
negative_prompt=negative_prompt,
num_inference_steps=num_inference_steps,
guidance_scale=guidance_scale,
width=width,
height=height
).images[0]
why aren't you using the A1111 to generate ? and for 6 images, you can set the text_encoder steps to 50-100 for less overfitting
why aren't you using the A1111 to generate ?
and for 6 images, you can set the text_encoder steps to 50-100 for less overfitting
@TheLastBen Im not using A1111 because i've build an API endpoint to generate only on already existing prompts then delete the instance.
Okay i'll try decreasing it but the results are good on 250 steps.
For a face you can keep it 250 for text_enc, but for a style, reduce it to allow flexibility
with the new settings, stick to 10 images per instance and around 600-800 unet steps per instance, and total text_enc steps to 400.
no need for regularization
This is only good for face, not for bodies, half bodies, or styles. We need more info.
Hi im using these training code and im not even getting the person that i've trained on, it't completely different person. First im training the text encoder on 350 steps then 1500 steps the UNet.
I've copied the code on
.ipynb
and im using aws ec2 instance to train on.