Closed Johnson-yue closed 1 year ago
It is recommended to install the specified version in requirements.txt. Because update of diffusers is large and many old features are deprecated, so the overall functionality of the code cannot be guaranteed with the lasted diffusers installed.
It is means that only using diffusers==0.10.1 can get the same performance as paper?
You may delete that line and try to use the other diffusers versions. Actually, we found the conflict after upgrading to a specific version because the later version deletes some of its old features. However, if you do not find any conflict and your code runs well, I think you can get the paper performance.
I using diffusers == 0.20.0 without this line. It can run well without any problem, But the performance is bad
Input-content : examples/content_image/01.png | Input-concept : examples/concept_image/08.png Input-content : examples/content_image/02.png | Input-concept : examples/concept_image/08.png Input-content : examples/content_image/03.png | Input-concept : examples/concept_image/08.png Input-content : examples/content_image/04.png | Input-concept : examples/concept_image/08.png Input-content : examples/content_image/05.png | Input-concept : examples/concept_image/08.png Input-content : examples/content_image/06.png | Input-concept : examples/concept_image/08.png Input-content : examples/content_image/07.png | Input-concept : examples/concept_image/08.png Input-content : examples/content_image/08.png | Input-concept : examples/concept_image/08.png Input-content : examples/content_image/09.png | Input-concept : examples/concept_image/08.png Input-content : examples/content_image/10.png | Input-concept : examples/concept_image/08.png Input-content : examples/content_image/11.png | Input-concept : examples/concept_image/08.png Input-content : examples/content_image/12.png | Input-concept : examples/concept_image/08.png Input-content : examples/content_image/13.png | Input-concept : examples/concept_image/08.png Input-content : examples/content_image/14.png | Input-concept : examples/concept_image/08.png
train-config:
accelerate launch main.py \
--concept_image_dir="./examples/concept_image" \
--content_image_dir="./examples/content_image" \
--output_image_path="./outputs" \
--pretrained_model_name_or_path="/path/to/Runwayml_stable-diffusion-v1-5" \
--initializer_token="girl" \
--max_train_steps=500 \
--concept_embedding_num=3 \
--cross_attention_injection_ratio=0.2 \
--self_attention_injection_ratio=0.9 \
--use_l1 --allow_tf32
Depending on what translation effect you want. If you think the concepts are not well-translated, you can decrease the self_attention_injection_ratio or cross_attention_injection_ratio, or increase the max_train_steps. If you think the content of the source image is not well preserved, you can increase the self_attention_injection_ratio or cross_attention_injection_ratio.
https://github.com/CrystalNeuro/visual-concept-translator/blob/85813a903c0a4e44f322c2132a9de8a244b6e4a3/new_scheduling_ddpm.py#L131
when I using diffusers == 0.20.0 ,it always raise a ValueError . So I remove it , and sd can be finetune.