Open Trithilon opened 1 year ago
FOR ME, I got good results with:
unet training steps = 150 steps x each image unet learning rate = 1e-6
text encoder steps = (total unet steps) * 50% text encoder learning rate = 1e-6
text encoder is something that there is not much information about, some say that 1/3 of the unet steps is enough, others that 70% if you train faces
@josemerinom : How many images is the least you need for good results?
One of the things I want to do is try to generate images and then retrain the model using the good ones to finetune it.
I have only trained faces, NOT styles
At first I used 50 images, but now I use 20 to 25 and have good results. I use images of the face, few of the medium body and 2 of the full body. It is important that the backgrounds and faces are different. Some images I edit to match the skin color (brightness, contrast, saturation).
Problem: A few days ago I trained a model again with a set of photos that I had already used and in 3 photos my person has 1 arm up touching her head. Dreambooth learned it, and now my created photos appear in that pose, so I'm editing my set so that no arms are visible.
Impressive results! I have had the hand problem as well - it tends to happen more if you have fewer images. I use remini.ai’s natural filter to undo any “filters” to correct the skin tones.
What would be your tips if you only have 6-7 images?
On Tue, 2 May 2023 at 6:21 PM, josemerinom @.***> wrote:
[image: image] https://user-images.githubusercontent.com/20940429/235671229-aa73356e-a3f1-4394-8c91-395f07dbd53f.png
— Reply to this email directly, view it on GitHub https://github.com/TheLastBen/fast-stable-diffusion/issues/2080#issuecomment-1531423268, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABYZNXIGKIAAUHCA2FFLBLLXED7NNANCNFSM6AAAAAAXSHZ4Q4 . You are receiving this because you authored the thread.Message ID: @.***>
-- --- H
Have noticed most of my models come out overcooked/saturated and I need to reduce CFG to 3-4 to get normal colours.
Any experience on that?
On Tue, 2 May 2023 at 6:24 PM, Huzaifa Arab @.***> wrote:
Impressive results! I have had the hand problem as well - it tends to happen more if you have fewer images. I use remini.ai’s natural filter to undo any “filters” to correct the skin tones.
What would be your tips if you only have 6-7 images?
On Tue, 2 May 2023 at 6:21 PM, josemerinom @.***> wrote:
[image: image] https://user-images.githubusercontent.com/20940429/235671229-aa73356e-a3f1-4394-8c91-395f07dbd53f.png
— Reply to this email directly, view it on GitHub https://github.com/TheLastBen/fast-stable-diffusion/issues/2080#issuecomment-1531423268, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABYZNXIGKIAAUHCA2FFLBLLXED7NNANCNFSM6AAAAAAXSHZ4Q4 . You are receiving this because you authored the thread.Message ID: @.***>
-- --- H
-- --- H
What model do you use to train?
I use Chillout Mix Settings: DPM++ SDE Karras/25 steps/CFG 6.0
With dreambooth, training with the base model sd 1.5, I have obtained bad results, there are people who merge their trained model with another.
Note: To train LoRa it is better to use the base model sd 1.5
To use another model, download the civitai model, upload it to my huggingface account, and use it with the download address
@josemerinom :
What would be your recommended way of training with fewer images 5-7? DB or Lora? Any particular tips?
I'll try chillout mix once. DPM++ is my fav as well. My plan was originally to train a DB model on base 1.5, extract a Lora of it and use it on every other model. But its a hit or miss. I am struggling to bring my CFG levels to 7, atleast without heavy use of negative prompts. I have used Base SD 1.5 and tried Metagod Real Realism with limited success. Problem is always the same with both, High CFG leads to burnt up/over-fitted generations ONLY for the token/subject in question. I am thinking of merging my trained model with another one like f111 or something more realistic (not stylized) to offset any overtraining I might have done.
Dreambooth gives you better images than lora, but it's getting 1 model (2gb) just for that face, versus lora which is a 50mb ~ 150mb file that you can use on different models.
I seek to obtain a more realistic face with details, I prefer dreambooth. But lora also gives good images if you know how to write good prompts. You have few images, so try doing several workouts with 5 images in dreambooth
Test 1: 5 images / 750 step / 187 text encoder step (25%) Test 2: 5 images / 750 step / 375 text encoder step (50%) Test 3: 5 images / 750 step / 562 text encoder step (75%)
And try using your identifier at the beginning and end of the prompt Example: "prsn, a beautiful woman, blue eyes, blonde hair, high detail skin, realistic, 8k,......,prsn". Indicating the identifier at the beginning and end has given me good results.
With LoRa use the Kohya LoRA Dreambooth colab
And about which model to use to train in dreambooth, I started using: Dreamshaper > RealisticVision > Chilloutmix, looking for real faces, but it depends on what you are looking for, in Civitai there are models to select and try.
Do tests with few images (for quick training) and get your own conclusions.
@josemerinom : Interesting. I'll try these settings out. Writing the token name before and after didn't help much, what is it supposed to do?
Do you caption your images for faces and set the External_Captions flag to true? If yes, can you show me a sample caption?
Also, have you tried training multiple subjects in one session? Did it help with overtraining?
I'm learning, so I don't talk about my experiments. I trained a model with 25 images with the token "ssnbk", and when I do the prompts I use "ssnbk, a woman in sexy bikini, ssnbk", mentioning the token 2 times generates a "double sweep" of the girl in the image. I "think" that the AI is generating the image in a linear way, in the previous example: generate an image of ssnbk, then based on that image it makes her sexy and wears a bikini, and at the end it takes the image again and do "tweaks" like ssnbk.
I have tried using 2 or 3 "( )", but using the token at the beginning and at the end has given me better results. (for me)
"ssnbk, a woman in bikini, ssnbk" "(ssnbk), a woman in bikini" "a woman in bikini, (ssnbk)"
I use TheLastBen's colab and I don't use caption and I haven't tried training more than 2 or more people/styles.
I want to try using Regularization Images, I'm creating my own set of images with SD (model base 1.5+vae), and I'll try if I get better results
Nice. Do you have a blog or someplace you share your work? Would love to follow you
As per comments in the notebook in my runpod instance, it says I need 500 steps for 10 images (50 steps per image): So with 7 images, I end up with 350 steps and it barely works.
I am noticing that doing 200 steps per image, ~30 images and 100 text encoder steps per image is giving the best results.