CrystalNeuro / visual-concept-translator

Code of ICCV 2023 paper titled General Image-to-Image Translation with One-Shot Image Guidance
Apache License 2.0
155 stars 12 forks source link

what is it meaning??? #5

Closed Johnson-yue closed 1 year ago

Johnson-yue commented 1 year ago

https://github.com/CrystalNeuro/visual-concept-translator/blob/85813a903c0a4e44f322c2132a9de8a244b6e4a3/new_scheduling_ddpm.py#L131

when I using diffusers == 0.20.0 ,it always raise a ValueError . So I remove it , and sd can be finetune.

CrystalNeuro commented 1 year ago

It is recommended to install the specified version in requirements.txt. Because update of diffusers is large and many old features are deprecated, so the overall functionality of the code cannot be guaranteed with the lasted diffusers installed.

Johnson-yue commented 1 year ago

It is means that only using diffusers==0.10.1 can get the same performance as paper?

CrystalNeuro commented 1 year ago

You may delete that line and try to use the other diffusers versions. Actually, we found the conflict after upgrading to a specific version because the later version deletes some of its old features. However, if you do not find any conflict and your code runs well, I think you can get the paper performance.

Johnson-yue commented 1 year ago

I using diffusers == 0.20.0 without this line. It can run well without any problem, But the performance is bad

Input-content : examples/content_image/01.png | Input-concept : examples/concept_image/08.png 01_08 Input-content : examples/content_image/02.png | Input-concept : examples/concept_image/08.png 02_08 Input-content : examples/content_image/03.png | Input-concept : examples/concept_image/08.png 03_08 Input-content : examples/content_image/04.png | Input-concept : examples/concept_image/08.png 04_08 Input-content : examples/content_image/05.png | Input-concept : examples/concept_image/08.png 05_08 Input-content : examples/content_image/06.png | Input-concept : examples/concept_image/08.png 06_08 Input-content : examples/content_image/07.png | Input-concept : examples/concept_image/08.png 07_08 Input-content : examples/content_image/08.png | Input-concept : examples/concept_image/08.png 08_08 Input-content : examples/content_image/09.png | Input-concept : examples/concept_image/08.png 09_08 Input-content : examples/content_image/10.png | Input-concept : examples/concept_image/08.png 10_08 Input-content : examples/content_image/11.png | Input-concept : examples/concept_image/08.png 11_08 Input-content : examples/content_image/12.png | Input-concept : examples/concept_image/08.png 12_08 Input-content : examples/content_image/13.png | Input-concept : examples/concept_image/08.png 13_08 Input-content : examples/content_image/14.png | Input-concept : examples/concept_image/08.png 14_08


train-config:

accelerate launch main.py \
    --concept_image_dir="./examples/concept_image" \
    --content_image_dir="./examples/content_image" \
    --output_image_path="./outputs" \
    --pretrained_model_name_or_path="/path/to/Runwayml_stable-diffusion-v1-5" \
    --initializer_token="girl" \
    --max_train_steps=500 \
    --concept_embedding_num=3 \
    --cross_attention_injection_ratio=0.2 \
    --self_attention_injection_ratio=0.9 \
    --use_l1 --allow_tf32
CrystalNeuro commented 1 year ago

Depending on what translation effect you want. If you think the concepts are not well-translated, you can decrease the self_attention_injection_ratio or cross_attention_injection_ratio, or increase the max_train_steps. If you think the content of the source image is not well preserved, you can increase the self_attention_injection_ratio or cross_attention_injection_ratio.