ali-vilab / Cones-V2

MIT License
502 stars 18 forks source link

Can not reproduce the results #6

Closed Kyfafyd closed 10 months ago

Kyfafyd commented 1 year ago

Thanks for sharing the code! I am trying to reproduce the composition of dog and mug by myself. My training command is as the following:

accelerate launch train_cones2.py \
  --pretrained_model_name_or_path="stabilityai/stable-diffusion-2-1-base"  \
  --instance_data_dir="./data/dog" \
  --instance_prompt=dog \
  --token_num=1 \
  --output_dir="cones_v2_output/dog_image" \
  --resolution=768 \
  --train_batch_size=1 \
  --gradient_accumulation_steps=1 \
  --learning_rate=5e-6 \
  --lr_scheduler="constant" \
  --lr_warmup_steps=0 \
  --max_train_steps=4000 \
  --loss_rate_first=1e-2 \
  --loss_rate_second=1e-3 \
  --enable_xformers_memory_efficient_attention

Once trained, I use the residual.pt at 4000th iter for inference, using the following inference config:

[
    {
        "prompt":"a mug and a dog on the beach",
        "residual_dict": {
            "dog":"cones_v2_output/dog_video/residual_3000.pt",
            "cat":"cones_v2_output/cat_image/residual.pt",
            "mug":"residuals/mug.pt",
            "flower":"residuals/flower.pt",
            "sunglasses":"residuals/sunglasses.pt",
            "lake":"residuals/lake.pt",
            "barn":"residuals/barn.pt"
        },
        "color_context":{
            "255,192,0":["mug",2.5],
            "255,0,0":["dog",2.5]
        },
        "guidance_steps":50,
        "guidance_weight":0.08,
        "weight_negative":-1e8,
        "layout":"layouts/layout_example.png",
        "subject_list":[["mug",2],["dog",5]]
    }
]

Then I get the generated image, and the dog is not promising (while the mug is good, using the provided mug.pt):

image

However, using the provided dog.pt, I can get good results:

image

Could you please help me figure this out? Or is there any tips for obtaining good results?

Johanan528 commented 1 year ago

Sorry to late reply, have you tried other prompts, or generated single dog using layout guidance?

Kyfafyd commented 1 year ago

Thanks for your response! All of the following results are using layout guidance. With the prompt A dog on the grass, I get the following: image

With the prompt Photo of a dog, I get the following: image

Also, I have tried to decrease the lr to 2e-5 and increase the steps to 10000, with the following dog and mug: image

Johanan528 commented 1 year ago

It seems that the images you uploaded are damaged? I will retrain “dog” locally according to your command and see the results.

Kyfafyd commented 1 year ago

Sorry for that, I do not know why this happen... But I can view the images on the phone. Thanks for your response

yy13138 commented 12 months ago

I received same question. Used your pt file got great,But train result is fail.

This's my result.(a cat and a dog on the beach) all_images

Kyfafyd commented 11 months ago

Any updates on this? 👀

Johanan528 commented 11 months ago

Sry to late reply, we tried retraining the white dog and obtained satisfied results, but we did not use 'enable_xformers_memory_efficient_attention'. Have you tried training without enabling memory-efficient training?

Kyfafyd commented 11 months ago

Sry to late reply, we tried retraining the white dog and obtained satisfied results, but we did not use 'enable_xformers_memory_efficient_attention'. Have you tried training without enabling memory-efficient training?

I have not tried trying without enabling memory-efficient because of OOM issue. Could you please show your re-trained results?

Johanan528 commented 11 months ago

Here are some results:

5 12